Cognitive Psychology: A Student's Handbook [8 ed.] 1138482218, 9781138482210

The fully updated eighth edition of Cognitive Psychology: A Student's Handbook provides comprehensive yet accessibl

12,180 2,085 72MB

English Pages 948 [981]

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Cognitive Psychology: A Student's Handbook [8 ed.]
 1138482218, 9781138482210

Table of contents :
Cover
Endorsements
Half Title
Prelims Advert
Title Page
Copyright Page
Dedication
Contents
List of illustrations
Preface
Visual tour (how to use this book)
Chapter 1 Approaches to human cognition
Introduction
Cognitive psychology
Cognitive neuropsychology
Cognitive neuroscience: the brain in action
Computational cognitive science
Comparisons of major approaches
Is there a replication crisis?
Outline of this book
Chapter summary
Further reading
Part I Visual perception and attention
Chapter 2 Basic processes in visual perception
Introduction
Vision and the brain
Two visual systems: perception-action model
Colour vision
Depth perception
Perception without awareness: subliminal perception
Chapter summary
Further reading
Chapter 3 Object and face recognition
Introduction
Pattern recognition
Perceptual organisation
Approaches to object recognition
Object recognition: top-down processes
Face recognition
Visual imagery
Chapter summary
Further reading
Chapter 4 Motion perception and action
Introduction
Direct perception
Visually guided movement
Visually guided action: contemporary approaches
Perception of human motion
Change blindness
Chapter summary
Further reading
Chapter 5 Attention and performance
Introduction
Focused auditory attention
Focused visual attention
Disorders of visual attention
Visual search
Cross-modal effects
Divided attention: dual-task performance
“Automatic” processing
Chapter summary
Further reading
Part II Memory
Chapter 6 Learning, memory and forgetting
Introduction
Short-term vs long-term memory
Working memory: Baddeley and Hitch
Working memory: individual differences and executive functions
Levels of processing (and beyond)
Learning through retrieval
Implicit learning
Forgetting from long-term memory
Chapter summary
Further reading
Chapter 7 Long-term memory systems
Introduction
Declarative memory
Episodic memory
Semantic memory
Non-declarative memory
Beyond memory systems and declarative vs non-declarative memory
Chapter summary
Further reading
Chapter 8 Everyday memory
Introduction
Autobiographical memory: introduction
Memories across the lifetime
Theoretical approaches to autobiographical memory
Eyewitness testimony
Enhancing eyewitness memory
Prospective memory
Theoretical perspectives on prospective memory
Chapter summary
Further reading
Part III Language
Chapter 9 Speech perception and reading
Introduction
Speech (and music) perception
Listening to speech
Context effects
Theories of speech perception
Cognitive neuropsychology
Reading: introduction
Word recognition
Reading aloud
Reading: eye-movement research
Chapter summary
Further reading
Chapter 10 Language comprehension
Introduction
Parsing: overview
Theoretical approaches: parsing and prediction
Pragmatics
Individual differences: working memory capacity
Discourse processing: inferences
Discourse comprehension: theoretical approaches
Chapter summary
Further reading
Chapter 11 Language production
Introduction
Basic aspects of speech production
Speech planning
Speech errors
Theories of speech production
Cognitive neuropsychology: speech production
Speech as communication
Writing: the main processes
Spelling
Chapter summary
Further reading
Part IV Thinking and reasoning
Chapter 12 Problem solving and expertise
Introduction
Problem solving: introduction
Gestalt approach and beyond: insight and role of experience
Problem-solving strategies
Analogical problem solving and reasoning
Expertise
Chess-playing expertise
Medical expertise
Brain plasticity
Deliberate practice and beyond
Chapter summary
Further reading
Chapter 13 Judgement and decision-making
Introduction
Judgement research
Theories of judgement
Decision-making under risk
Decision-making: emotional and social factors
Applied and complex decision-making
Chapter summary
Further reading
Chapter 14 Reasoning and hypothesis testing
Introduction
Hypothesis testing
Deductive reasoning
Theories of “deductive” reasoning
Brain systems in reasoning
Informal reasoning
Are humans rational?
Chapter summary
Further reading
Part V Broadening horizons
Chapter 15 Cognition and emotion
Introduction
Appraisal theories
Emotion regulation
Affect and cognition: attention and memory
Affect and cognition: judgement and decision-making
Judgement and decision-making: theoretical approaches
Anxiety, depression and cognitive biases
Cognitive bias modification and beyond
Chapter summary
Further reading
Chapter 16 Consciousness
Introduction
Functions of consciousness
Assessing consciousness and conscious experience
Global workspace and global neuronal workspace theories
Is consciousness unitary?
Chapter summary
Further reading
Glossary
References
Author index
Subject index
End Matter Advert

Citation preview

“This edition of Eysenck and Keane has further enhanced the status of Cognitive Psychology: A Student’s Handbook, as a high benchmark that other textbooks on this topic fail to achieve. It is informative and innovative, without losing any of its hallmark coverage and readability.” Professor Robert Logie, School of Philosophy, Psychology and Language Sciences, University of Edinburgh, United Kingdom “The best student’s handbook on cognitive psychology – an indispensable volume brought up-to-date in this latest edition. It explains everything from low-level vision to high-level consciousness, and it can serve as an introductory text.” Professor Philip Johnson-Laird, Stuart Professor of Psychology, Emeritus, Princeton University, United States “I first read Eysenck and Keane’s Cognitive Psychology: A Student’s Handbook in its third edition, during my own undergraduate studies. Over the course of its successive editions since then, the content – like the field of cognition itself – has evolved and grown to encompass current trends, novel approaches and supporting learning resources. It remains, in my opinion, the gold standard for cognitive psychology textbooks.” Dr Richard Roche, Senior Lecturer, Department of Psychology, Maynooth University, Ireland “Eysenck and Keane have once again done an excellent job, not only in terms of keeping the textbook up-to-date with the latest studies, issues and debates; but also by making the content even more accessible and clear without compromising accuracy or underestimating the reader’s intelligence. After all these years, this book remains an essential tool for students of cognitive psychology, covering the topic in the appropriate breadth and depth.” Dr Gerasimos Markopoulos, Senior Lecturer, School of Science, Bath Spa University, United Kingdom “Eysenck and Keane’s popular textbook offers comprehensive coverage of what psychology students need to know about human cognition. The textbook introduces the core topics of cognitive psychology that serve as the fundamental building blocks to our understanding of human behaviour. The authors integrate contemporary developments in the field and provide an accessible entry to neighboring disciplines such as cognitive neuroscience and neuropsychology.” Dr Motonori Yamaguchi, Senior Lecturer, Department of Psychology, University of Essex, United Kingdom

9781138482210_COGNITIVE_PSYCHOLOGY_PRE_CHAP_1.indd 1

28/02/20 2:15 PM

“The eighth edition of Cognitive Psychology by Eysenck and Keane provides possibly the most comprehensive coverage of cognition currently available. The text is clear and easy to read with clear links to theory across the chapters. A real highlight is the creative use of up-to-date real-world examples throughout the book.” Associate Professor Rhonda Shaw, Head of the School of Psychology, Charles Sturt University, Australia “Unmatched in breadth and scope, it is the authoritative textbook on cognitive psychology. It outlines the history and major developments within the field, while discussing state-of-the-art experimental research in depth. The integration of online resources keeps the material fresh and engaging.” Associate Professor Søren Risløv Staugaard, Department of Psychology and Behavioural Sciences, Aarhus University, Denmark “Eysenck and Keane’s Cognitive Psychology provides comprehensive topic coverage and up-to-date research. The writing style is concise and easy to follow, which makes the book suitable for both undergraduate and graduate students. The authors use real-life examples that are easily relatable to students, making the book very enjoyable to read.” Associate Professor Lin Agler, School of Psychology, University of Southern Mississippi Gulf Coast, United States

9781138482210_COGNITIVE_PSYCHOLOGY_PRE_CHAP_1.indd 2

28/02/20 2:15 PM

Cognitive Psychology

The fully updated eighth edition of Cognitive Psychology: A Student’s Handbook provides comprehensive yet accessible coverage of all the key areas in the field ranging from visual perception and attention through to memory and language. Each chapter is complete with key definitions, practical real-life applications, chapter summaries and suggested further reading to help students develop an understanding of this fascinating but complex field. The new edition includes: ● ● ●

an increased emphasis on neuroscience updated references to reflect the latest research applied ‘in the real world’ case studies and examples.

Widely regarded as the leading undergraduate textbook in the field of cognitive psychology, this new edition comes complete with an enhanced accompanying companion website. The website includes a suite of learning resources including simulation experiments, multiple-choice questions, and access to Primal Pictures’ interactive 3D atlas of the brain. The companion website can be accessed at: www.routledge.com/cw/eysenck. Michael W. Eysenck is Professor Emeritus in Psychology at Royal Holloway, University of London, United Kingdom. He is also Professorial Fellow at Roehampton University, London. He is the best-selling author of several textbooks including Fundamentals of Cognition (2018), Memory (with Alan Baddeley and Michael Anderson, 2020) and Fundamentals of Psychology (2009). Mark T. Keane is Chair of Computer Science at University College Dublin, Ireland.

9781138482210_COGNITIVE_PSYCHOLOGY_PRE_CHAP_1.indd 3

28/02/20 2:15 PM

Visit the Companion Website to access a range of interactive teaching and learning resources

Includes access to Primal Pictures’ interactive 3D brain www.routledge.com/cw/eysenck

PRIMAL PICTURES

Revolutionizing medical education with anatomical solutions to fit every need For over 27 years, Primal Pictures has led the way in offering premier 3D digital human anatomy solutions, transforming how educators teach and students learn the complexities of human anatomy and medicine. Our pioneering scientific approach puts quality, accuracy and detail at the heart of everything we do. Primal’s experts have created the world’s most medically accurate and detailed 3D reconstruction of human anatomy using real scan data from the NLM Visible Human Project®, as well as CT images and MRIs. With advanced academic research and thousands of development hours underpinning its creation, our model surpasses all other anatomical resources available. To learn more about Primal’s cutting-edge solution for better learning outcomes and increased student engagement visit www.primalpictures.com/students

9781138482210_COGNITIVE_PSYCHOLOGY_PRE_CHAP_1.indd 4

28/02/20 2:15 PM

COGNITIVE PSYCHOLOGY A Student’s Handbook Eighth Edition

MICHAEL W. EYSENCK AND MARK T. KEANE

9781138482210_COGNITIVE_PSYCHOLOGY_PRE_CHAP_1.indd 5

28/02/20 2:15 PM

Eighth edition published 2020 by Routledge 2 Park Square, Milton Park, Abingdon, Oxon OX14 4RN and by Routledge 52 Vanderbilt Avenue, New York, NY 10017 Routledge is an imprint of the Taylor & Francis Group, an informa business © 2020 Michael W. Eysenck and Mark T. Keane The right of Michael W. Eysenck and Mark T. Keane to be identified as authors of this work has been asserted by them in accordance with sections 77 and 78 of the Copyright, Designs and Patents Act 1988. All rights reserved. No part of this book may be reprinted or reproduced or utilised in any form or by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying and recording, or in any information storage or retrieval system, without permission in writing from the publishers. Trademark notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. First edition published by Lawrence Erlbaum Associates 1984 Seventh edition Published by Routledge 2015 Every effort has been made to contact copyright-holders. Please advise the publisher of any errors or omissions, and these will be corrected in subsequent editions. British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library Library of Congress Cataloging-in-Publication Data A catalog record has been requested for this book ISBN: 978-1-13848-221-0 (hbk) ISBN: 978-1-13848-223-4 (pbk) ISBN: 978-1-35105-851-3 (ebk) Typeset in Times New Roman by Servis Filmsetting Ltd, Stockport, Cheshire Visit the companion website: www.routledge.com/cw/eysenck.

9781138482210_COGNITIVE_PSYCHOLOGY_PRE_CHAP_1.indd 6

28/02/20 2:15 PM

To Christine with love (M.W.E.)

What moves science forward is argument, debate, and the testing of alternative theories . . . A science without controversy is a science without progress. (Jerry Coyne)

9781138482210_COGNITIVE_PSYCHOLOGY_PRE_CHAP_1.indd 7

28/02/20 2:15 PM

Taylor& Francis Taylor & Francis Group http://taylorandfrancis.com

Contents List of illustrations Preface Visual tour (how to use this book)

xiv xxix xxxi

1 Approaches to human cognition Introduction 1 Cognitive psychology 3 Cognitive neuropsychology 7 Cognitive neuroscience: the brain in action Computational cognitive science 26 Comparisons of major approaches 33 Is there a replication crisis? 34 Outline of this book 36 Chapter summary 37 Further reading 39

1

12

PART I Visual perception and attention

41

2 Basic processes in visual perception Introduction 43 Vision and the brain 44 Two visual systems: perception-action model 55 Colour vision 64 Depth perception 71 Perception without awareness: subliminal perception Chapter summary 90 Further reading 92

3 Object and face recognition

43

81

94

Introduction 94 Pattern recognition 95 Perceptual organisation 96

9781138482210_COGNITIVE_PSYCHOLOGY_PRE_CHAP_1.indd 9

28/02/20 2:15 PM

x

Contents

Approaches to object recognition 103 Object recognition: top-down processes Face recognition 116 Visual imagery 130 Chapter summary 137 Further reading 139

111

4 Motion perception and action

140

Introduction 140 Direct perception 141 Visually guided movement 145 Visually guided action: contemporary approaches Perception of human motion 157 Change blindness 163 Chapter summary 175 Further reading 176

152

5 Attention and performance Introduction 178 Focused auditory attention 179 Focused visual attention 183 Disorders of visual attention 196 Visual search 200 Cross-modal effects 208 Divided attention: dual-task performance “Automatic” processing 226 Chapter summary 231 Further reading 233

178

212

PART II Memory 6 Learning, memory and forgetting

237 239

Introduction 239 Short-term vs long-term memory 240 Working memory: Baddeley and Hitch 246 Working memory: individual differences and executive functions 254 Levels of processing (and beyond) 262 Learning through retrieval 265 Implicit learning 269 Forgetting from long-term memory 278 Chapter summary 293 Further reading 295

9781138482210_COGNITIVE_PSYCHOLOGY_PRE_CHAP_1.indd 10

28/02/20 2:15 PM

Contents

7 Long-term memory systems

xi

296

Introduction 296 Declarative memory 300 Episodic memory 305 Semantic memory 313 Non-declarative memory 325 Beyond memory systems and declarative vs non-declarative memory 332 Chapter summary 340 Further reading 342

8 Everyday memory

344

Introduction 344 Autobiographical memory: introduction 346 Memories across the lifetime 351 Theoretical approaches to autobiographical memory 355 Eyewitness testimony 363 Enhancing eyewitness memory 372 Prospective memory 375 Theoretical perspectives on prospective memory 381 Chapter summary 389 Further reading 391

PART III Language

393

9 Speech perception and reading

403

Introduction 403 Speech (and music) perception 404 Listening to speech 408 Context effects 412 Theories of speech perception 417 Cognitive neuropsychology 429 Reading: introduction 432 Word recognition 436 Reading aloud 442 Reading: eye-movement research 453 Chapter summary 457 Further reading 460

10 Language comprehension Introduction 461 Parsing: overview 462 Theoretical approaches: parsing and prediction Pragmatics 478

9781138482210_COGNITIVE_PSYCHOLOGY_PRE_CHAP_1.indd 11

461

464

28/02/20 2:15 PM

xii

Contents

Individual differences: working memory capacity 487 Discourse processing: inferences 490 Discourse comprehension: theoretical approaches 498 Chapter summary 510 Further reading 512

11 Language production Introduction 514 Basic aspects of speech production 516 Speech planning 519 Speech errors 521 Theories of speech production 525 Cognitive neuropsychology: speech production Speech as communication 543 Writing: the main processes 549 Spelling 558 Chapter summary 564 Further reading 566

514

536

PART IV Thinking and reasoning

569

12 Problem solving and expertise

573

Introduction 573 Problem solving: introduction 574 Gestalt approach and beyond: insight and role of experience Problem-solving strategies 588 Analogical problem solving and reasoning 593 Expertise 600 Chess-playing expertise 601 Medical expertise 604 Brain plasticity 609 Deliberate practice and beyond 612 Chapter summary 619 Further reading 621

13 Judgement and decision-making Introduction 622 Judgement research 623 Theories of judgement 633 Decision-making under risk 640 Decision-making: emotional and social factors Applied and complex decision-making 654 Chapter summary 663 Further reading 665

9781138482210_COGNITIVE_PSYCHOLOGY_PRE_CHAP_1.indd 12

576

622

649

28/02/20 2:15 PM

Contents

14

666

Reasoning and hypothesis testing Introduction 666 Hypothesis testing 667 Deductive reasoning 672 Theories of “deductive” reasoning Brain systems in reasoning 690 Informal reasoning 694 Are humans rational? 701 Chapter summary 708 Further reading 710

xiii

680

PART V Broadening horizons

713

15 Cognition and emotion

715

Introduction 715 Appraisal theories 719 Emotion regulation 723 Affect and cognition: attention and memory 730 Affect and cognition: judgement and decision-making 738 Judgement and decision-making: theoretical approaches 750 Anxiety, depression and cognitive biases 753 Cognitive bias modification and beyond 761 Chapter summary 764 Further reading 766

16 Consciousness Introduction 767 Functions of consciousness 768 Assessing consciousness and conscious experience 775 Global workspace and global neuronal workspace theories Is consciousness unitary? 792 Chapter summary 798 Further reading 799 Glossary References Author index Subject index

9781138482210_COGNITIVE_PSYCHOLOGY_PRE_CHAP_1.indd 13

767

783

801 824 915 931

28/02/20 2:15 PM

Illustrations TABLES 1.1 1.2 1.3 11.1 15.1

Approaches to human cognition Major techniques used to study the brain Strengths and limitations of major approaches to human cognition Involvement of working memory components in various writing processes Effects of anxiety and depression on attentional bias (engagement and disengagement)

3 16 35 556 757

PHOTOS Chapter 1 • Max Coltheart • The magnetic resonance imaging (MRI) scanner • Transcranial magnetic stimulation coil • The IBM Watson and two human contestants (Ken Jennings and Brad Rutter)

8 18 21 27

Chapter 3 • Irving Biederman • Heather Sellers

107 118

Chapter 6 • Alan Baddeley and Graham Hitch • Endel Tulving

246 287

Chapter 7 • Henry Molaison

297

Chapter 8 • Jill Price • World Trade Center attacks on 9/11 • Jennifer Thompson and Ronald Cotton

348 349 364

9781138482210_COGNITIVE_PSYCHOLOGY_PRE_CHAP_1.indd 14

28/02/20 2:15 PM

Illustrations

Chapter 11 • Iris Murdoch

550

Chapter 12 • Monty Hall • Fernand Gobet • Magnus Carlsen

575 602 613

Chapter 13 • Pat Croskerry • Nik Wallenda

625 647

xv

FIGURES 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 1.10 1.11 1.12 1.13 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.10

An early version of the information processing approach Diagram to demonstrate top–down processing Test yourself by naming the colours in each column The four lobes, or divisions, of the cerebral cortex in the left hemisphere Brodmann brain areas on the lateral and medial surfaces The brain network and cost efficiency The organisation of the “rich club” The spatial and temporal resolution of major techniques and methods used to study brain functioning Areas showing greater activation in a dead salmon when presented with photographs of people than when at rest The primitive mock neuroimaging device used by Ali et al. (2014) Architecture of a basic three-layer connectionist network The main modules of the ACT-R cognitive architecture with their locations within the brain The basic structure of the standard model of the mind involving five independent modules Complex scene that requires prolonged perceptual processing to understand fully Route of visual signals Simultaneous contrast involving lateral inhibition Some distinctive features of the largest visual cortical areas Connectivity within the ventral pathway on the lateral surface of the macaque brain (a) The single hierarchical model; (b) the parallel hierarchical model; (c) the three parallel hierarchical feedforward systems model The percentage of cells in six different visual cortical areas responding selectively to orientation, direction of motion, disparity and colour Visual motion inputs Goodale and Milner’s (1992) perception-action model showing the dorsal and ventral streams Lesion overlap in patients with optic ataxia

9781138482210_COGNITIVE_PSYCHOLOGY_PRE_CHAP_1.indd 15

4 4 5 13 13 14 15 17 25 26 28 30 31 43 45 46 47 48 49 52 53 56 57

28/02/20 2:15 PM

xvi

Illustrations

2.11 The Müller-Lyer illusion 58 2.12 The Ebbinghaus illusion 59 2.13 The hollow-face illusion. Left: normal and hollow faces with small target magnets on the forehead and cheek of the normal face; right: front view of the hollow mask that appears as an illusory face projecting forwards 60 2.14 Disruption of size judgements when estimated perceptually (estimation) or produced by grasping (grasping) in full or restricted vision 61 2.15 Historical developments in theories linking perception and action 63 2.16 Schematic diagram of the early stages of neural colour processing 66 2.17 Photograph of a mug showing enormous variation in the properties of the reflected light across the mug’s surface 67 2.18 “The Dress” made famous by its appearance on the internet 69 2.19 Observers’ perceptions of “The Dress” 69 2.20 An engraving by de Vries (1604/1970) in which linear perspective creates an effective three-dimensional effect when viewed from very close but not from further away 72 2.21 Examples of texture gradients that can be perceived as surfaces receding into the distance 73 2.22 Kanizsa’s (1976) illusory square 73 2.23 Accuracy of size judgements as a function of object type 78 2.24 (a) A representation of the Ames room; (b) an actual Ames room showing the effect achieved with two adults 79 2.25 Perceived distance. Top: stimuli presented to participants; bottom: example of the stimulus display 81 2.26 The body size effect: what participants in the doll experiment could see 81 2.27 Estimated contributions of conscious and subconscious processing to GY’s performance in exclusion and inclusion conditions in his normal and blind fields 84 2.28 The areas of most relevance to blindsight are the lateral geniculate nucleus and middle temporal visual area 86 2.29 The relationship between response bias in reporting conscious awareness and enhanced N200 on no-awareness correct trials compared to no-awareness incorrect trials (UC) 89 3.1 The kind of stimulus used by Navon (1977) to demonstrate the importance of global features in perception 95 3.2 The CAPTCHA used by Yahoo 97 3.3 The FBI’s mistaken identification of the Madrid bomber 98 3.4 Examples of the Gestalt laws of perceptual organisation: (a) the law of proximity; (b) the law of similarity; (c) the law of good continuation; and (d) the law of closure 99 3.5 An ambiguous drawing that can be seen as either two faces or as a goblet 100 3.6 The tendency to perceive an array of empty circles as (A) a rotated square or (B) a diamond 101 3.7 A task to decide which region in each stimulus is the figure 102

9781138482210_COGNITIVE_PSYCHOLOGY_PRE_CHAP_1.indd 16

28/02/20 2:15 PM

Illustrations

3.8 3.9 3.10 3.11 3.12 3.13

3.14 3.15 3.16 3.17 3.18 3.19 3.20 3.21

3.22 3.23 3.24 3.25 4.1 4.2 4.3 4.4 4.5

4.6

High and low spatial frequency versions of a place (a building) Image of Mona Lisa revealing very low spatial frequencies (left), low spatial frequencies (centre) and high spatial frequencies (right) An outline of Biederman’s recognition-by-components theory Ambiguous figures A brick wall that can be seen as something else Object recognition involving two different routes: (1) a topdown route in which information proceeds rapidly to the orbitofrontal cortex; (2) a bottom-up route using the slower ventral visual stream Interactive-iterative framework for object recognition Recognising an elephant when a key feature (its trunk) is partially hidden Accuracy and speed of object recognition for birds, boats, cars, chairs and faces by patient GG and healthy controls Face-selective areas in the right hemisphere An array of 40 faces to be matched for identity The model of face recognition put forward by Bruce and Young (1986) Damage to regions of the inferior occipito-temporal cortex, the anterior inferior temporal cortex and the anterior temporal pole The approximate locations of the visual buffer in BA17 and BA18, of long-term memories of shapes in the inferior temporal lobe, and of spatial representations in the posterior parietal cortex Dwell time for the four quadrants of a picture during perception and imagery Slezak’s (1991, 1995) investigations into the effects of rotation on object recognition The extent to which perceived or imagined objects could be classified accurately on the basis of brain activity in the early visual cortex and object-selective cortex Connectivity during perception and imagery involving (a) bottom-up processing; and (b) top-down processing The optic-flow field as a pilot comes in to land, with the focus of expansion in the middle Graspable and non-graspable objects having similar asymmetrical features The visual features of a road viewed in perspective The far road “triangle” in (A) a left turn and (B) a right turn Errors in time-to-contact judgements for the smaller and the larger object as a function of whether they were presented in their standard size, the reverse size (off-size) or lacking texture (no-texture) The dorso-dorsal and ventro-dorsal streams showing their brain locations and forms of processing

9781138482210_COGNITIVE_PSYCHOLOGY_PRE_CHAP_1.indd 17

xvii

104 105 107 112 114

115 115 116 120 121 124 126 127

132 133 134 135 135 142 143 147 148

150 156

28/02/20 2:15 PM

xviii

Illustrations

4.7 4.8 4.9 4.10 4.11 4.12 4.13 4.14 4.15

4.16

4.17 4.18 5.1 5.2

5.3 5.4 5.5 5.6 5.7 5.8 5.9 5.10

Point-light sequences (a) with the walker visible and (b) with the walker not visible Human detection and discrimination efficiency for human walkers presented in contour, point lights, silhouette and skeleton Brain areas involved in biological motion processing The main brain areas associated with the mirror neuron system plus their interconnections The unicycling clown who cycled close to students walking across a large square The sequence of events in the disappearing lighter trick Participants’ fixation points at the time of dropping the lighter Change blindness: an example (a) Percentage of correct change detection as a function of form of change and time of fixation; also false alarm rate when there was no change. (b) Mean percentage correct change detection as a function of the number of fixations between target fixation and change of target and form of change (a) Change-detection accuracy as a function of task difficulty and visual eccentricity. (b) The eccentricity at which changedetection accuracy was 85% correct as a function of task difficulty An example of inattentional blindness: a woman in a gorilla suit in the middle of a game of passing the ball An example of inattentional blindness: the sequence of events on the initial baseline trials and the critical trial A comparison of Broadbent’s theory, Treisman’s theory, and Deutsch and Deutsch’s theory Split attention. (a) Shaded areas indicate the cued locations; the near and far locations are not cued. (b) Probability of target detection at valid (left or right) and invalid (near or far) locations A comparison of object-based and space-based attention Object-based and space-based attention. (a) Possible target locations for a given cue. (b) Performance accuracy at the various target locations Sample displays for three low perceptual load conditions in which the task required deciding whether a target X or N was presented The brain areas associated with the dorsal or goal-directed attention network and the ventral or stimulus-driven network A theoretical approach based on several functional networks of relevance to attention: fronto-parietal; default mode; cingulo-opercular; and ventral attention An example of object-centred or allocentric neglect Illegal and dangerous items captured by an airport security screener Frequency of selection and identification errors when targets were present at trials

9781138482210_COGNITIVE_PSYCHOLOGY_PRE_CHAP_1.indd 18

157 158 159 161 164 166 166 168

169

170 172 174 181

185 187 188 190 193 195 197 201 201

28/02/20 2:15 PM

Illustrations

5.11 Performance speed on a detection task as a function of target definition (conjunctive vs single feature) and display size 5.12 Eye fixations made by observers searching for pedestrians 5.13 A two-pathway model of visual search 5.14 An example of a visual search task when considering feature integration theory 5.15 An example of temporal ventriloquism in which the apparent time of onset of a flash is shifted towards that of a sound presented at a slightly different timing from the flash 5.16 Wickens’s four-dimensional multiple-resource model 5.17 Threaded cognition theory 5.18 Patterns of brain activation: (a) underadditive activation; (b) additive activation; (c) overadditive activation 5.19 Effects of an audio distraction task on brain activity associated with a straight driving task 5.20 Dual-task (auditory and visual tasks) and single-task (auditory or visual task) conditions: reaction times for correct responses only over eight experimental sessions 5.21 Response times on a decision task as a function of memory-set size, display-set size and consistent vs varied mapping 5.22 Factors that are hypothesised to influence representational quality within Moors’ (2016) theoretical approach 6.1 The multi-store model of memory as proposed by Atkinson and Shiffrin (1968) 6.2 Short-term memory performance in conditions designed to create interference (repeated condition) or minimise interference (unique condition) 6.3 The working memory model showing the connections among its four components and their relationship to long-term memory 6.4 Phonological loop system as envisaged by Baddeley (1990) 6.5 Sites where direct electrical stimulation disrupted digit-span performance 6.6 Amount of interference on a spatial task and a visual task as a function of a secondary task (spatial: movement vs visual: colour discrimination) 6.7 Screen displays for the digit 6 6.8 Mean reaction times quintile-by-quintile on the anti-saccade task by groups high and low in working memory capacity 6.9 Schematic representation of the unity and diversity of three executive functions 6.10 Activated brain regions across all executive functions in a meta-analysis of 193 studies 6.11 Recognition memory performance as a function of processing depth (shallow vs deep) for three types of stimuli: doors, clocks, and menus 6.12 Distinctiveness. Percentage recall of the critical item (e.g., kiwi) and of the preceding and following items in the encoding, retrieval and control conditions

9781138482210_COGNITIVE_PSYCHOLOGY_PRE_CHAP_1.indd 19

xix

203 204 205 208 210 216 218 220 221 224 227 229 240 243 246 248 249 250 253 256 259 260 263 264

28/02/20 2:15 PM

xx

Illustrations

6.13 (a) Restudy causes strengthening of the memory trace formed after initial study; (b) testing with feedback causes strengthening of the memory trace; and (c) the formation of a second memory trace 266 6.14 (a) Final recall for restudy-only and test-restudy group participants; (b) recall performance in the CMR group as a function of whether the mediators were or were not retrieved 267 6.15 Mean recall percentage in Session 2 on Test 1 and Test 2 as function of retrieval practice or restudy practice in Session 1 268 6.16 Schematic representation of a traditional keyboard 270 6.17 Mean number of completions in inclusion and exclusion conditions as a function of number of trials 273 6.18 Response times for participants showing a sudden drop in reaction times or not showing such a drop 273 6.19 The striatum is of central importance in implicit learning 274 6.20 A model of motor sequence learning 275 6.21 Sequential motor skill learning dependencies 276 6.22 Skilled typists’ performance when tested on a traditional keyboard 277 6.23 Forgetting over time as indexed by reduced savings 279 6.24 Methods of testing for proactive and retroactive interference 281 6.25 Percentage of items recalled over time for the conditions: no proactive interference, remember and forget 282 6.26 Percentage of words correctly recalled across 32 articles in the respond, baseline and suppress conditions 286 6.27 Proportion of words recalled in high- and low-overload conditions with intra-list cues, strong extra-list cues and weak extra-list cues 289 7.1 Damage to brain areas within and close to the medial temporal lobes producing amnesia 298 7.2 The standard account based on dividing long-term memory into two broad classes: declarative and non-declarative 300 7.3 Interactions between episodic memories, semantic memories and gist memories 305 7.4 (a) Locations of the hippocampus, the perirhinal cortex and the parahippocampal cortex; (b) the binding-of-item-andcontext model 307 7.5 (A) Left lateral, (B), medial and (C) anterior views of prefrontal areas having greater activation to familiarity-based than recollection-based processes and areas showing the opposite pattern 309 7.6 Sample pictures on the recognition-memory test 309 7.7 (A) Areas activated for both episodic simulation and episodic memory; (B) areas more activated for episodic simulation than episodic memory 312 7.8 Accuracy of (a) object categorisation and (b) speed of categorisation at the superordinate, basic and subordinate levels 315 7.9 The hub-and-spoke model 319

9781138482210_COGNITIVE_PSYCHOLOGY_PRE_CHAP_1.indd 20

28/02/20 2:15 PM

Illustrations

7.10 Performance accuracy on tool function and tool manipulation tasks with anodal transcranial direct current stimulation to the anterior temporal lobe or to the inferior parietal lobule and in a control condition 7.11 Categorisation performance for pictures and words by healthy controls and patients with semantic dementia 7.12 Percentages of priming effect and recognition-memory performance of healthy controls and patients 7.13 Brain regions showing repetition suppression or response enhancement in a meta-analysis 7.14 Mean reaction times on the serial reaction time task by Parkinson’s disease patients and healthy controls 7.15 A processing-based memory model 7.16 Recognition memory for faces presented and tested in a fixed or variable viewpoint 7.17 Brain areas whose activity during episodic learning predicted increased recognition-memory performance (task-positive) or decreased performance (task-negative) 7.18 A three-dimensional model of memory: (1) conceptually or perceptually driven; (2) relational or item stimulus representation; (3) controlled or automatic/involuntary intention 7.19 Process-specific alliances including the left angular gyrus are involved in recollection of episodic memories and semantic processing 8.1 Brain regions activated by autobiographical, episodic retrieval and mentalising tasks including regions of overlap 8.2 Number of internal details specific to an autobiographical event recalled at various time delays (by controls and individuals with highly superior autobiographical memory) 8.3 Childhood amnesia based on data reported by Rubin and Schulkind (1997) 8.4 Temporal distribution of autobiographical memories across the lifespan 8.5 The knowledge structures within autobiographical memory, as proposed by Conway (2005) 8.6 The mean number of events participants could remember from the past 5 days and those they imagined were likely over the next 5 days 8.7 A model of the bidirectional relationships between neural networks involved in the construction and/or elaboration of autobiographical memories 8.8 Life structure scores (proportion negative, compartmentalisation, positive redundancy, negative redundancy) for patients with major depressive disorder, patients in remission from major depressive disorder and healthy controls 8.9 Four cognitive biases related to autobiographical memory recall that maintain depression and increase the risk of recurrence following remission

9781138482210_COGNITIVE_PSYCHOLOGY_PRE_CHAP_1.indd 21

xxi

321 324 326 328 330 334 335 337

339 339 347 348 352 354 357 358 360

361 362

28/02/20 2:15 PM

xxii

Illustrations

8.10 Examples of Egyptian and UK face-matching arrays 366 8.11 Size of the misinformation effect as a function of detail memorability in the neutral condition 367 8.12 Extent of misinformation effects as a function of condition for the original memory and endorsement of the misinformation presented previously 371 8.13 Eyewitness identification: test of face-recognition performance 371 8.14 A model of the component processes involved in prospective memory 378 8.15 Mean failures to resume an interrupted task and mean resumption times for the conditions: no-interruption, blank-screen interruption and secondary air traffic control task interruption 379 8.16 Self-reported memory vividness, memory details and confidence in memory for individuals with good and poor inhibitory control before and after repeated checking 381 8.17 The dual-pathways model of prospective memory (based on the multi-process framework) for non-focal and focal tasks separately 383 8.18 Example 1: top-down monitoring processes operating in isolation. Example 2: bottom-up spontaneous retrieval processes operating in isolation. Example 3: dual processes operating dynamically 383 8.19 (a) Sustained and (b) transient activity in the (c) left anterior prefrontal cortex for non-focal and focal prospective memory tasks 385 8.20 Frequency of cue-driven monitoring following the presentation of semantically related or unrelated cues 386 8.21 Different ways the instruction to press Q for fruit words was encoded 388 9.1 (a) Areas activated during passive music listening and passive speech listening; (b) areas activated more by listening to music than speech or the opposite 406 9.2 The main processes involved in speech perception and comprehension 407 9.3 A hierarchical approach to speech segmentation involving three levels or tiers 410 9.4 A model of spoken-word comprehension 412 9.5 Gaze probability for critical objects over the first 1,000 ms since target word onset for target neutral, competitor neutral, competitor constraining and unrelated neutral conditions 414 9.6 Mean target duration required for target recognition for words and sounds presented in isolation or within a general sentence context 420 9.7 The basic TRACE model, showing how activation between the three levels (word, phoneme and feature) is influenced by bottom-up and top-down processing. 421

9781138482210_COGNITIVE_PSYCHOLOGY_PRE_CHAP_1.indd 22

28/02/20 2:15 PM

Illustrations

9.8 9.9 9.10 9.11 9.12 9.13 9.14 9.15 9.16 9.17 9.18 9.19 9.20 10.1 10.2 10.3 10.4 10.5 10.6 10.7 10.8 10.9 10.10

(a) Actual eye fixations on the object corresponding to a spoken word or related to it; (b) predicted eye fixations from the TRACE model Mean reaction times for recognition of /t/ and /k/ phonemes in words and non-words Fixation proportions to high-frequency target words during the first 1,000 ms after target onset A sample display showing two nouns (“bench” and “rug”) and two verbs (“pray” and “run”). Processing and repetition of spoken words according to the three-route framework A general framework of the processes and structures involved in reading comprehension Estimated reading ability over a 30-month period with initial testing at a mean age of 66 months for English, Spanish and Czech children McClelland and Rumelhart’s (1981) interactive activation model of visual word recognition The time course of inhibitory and facilitatory effects of priming Basic architecture of the dual-route cascaded model The three components of the triangle model and their associated neural regions: orthography, phonology and semantics Mean naming latencies for high-frequency and low-frequency words that were irregular or regular and inconsistent Key assumptions of the E-Z Reader model Total sentence processing time as a function of sentence type A model of language processing involving heuristic and algorithmic routes Sentence reading times as a function of the way in which comprehension was assessed: detailed questions; superficial questions on all trials; or occasional superficial questions The N400 responses to a critical word in correct and incorrect sentences Response times for literally false, scrambled metaphor, and metaphor sentences in (a) written and (b) spoken conditions) Mean reaction times to verify metaphor-relevant and metaphor-irrelevant properties Mean proportion of statements rated comprehensible with a response deadline of 500 or 1600 ms: literal, forward metaphors, reversed metaphors and scrambled metaphors Sample displays seen from the listener’s perspective Proportion of fixation on four objects over time A theoretical framework for reading comprehension involving interacting passive and reader-initiated processes

9781138482210_COGNITIVE_PSYCHOLOGY_PRE_CHAP_1.indd 23

xxiii

422 423 428 428 430 433 434 437 440 443 448 451 455 471 473 474 476 480 482 483 485 486 492

28/02/20 2:15 PM

xxiv

Illustrations

10.11 Reaction times to name colours when the word presented in colour was predictable from the preceding text compared to a control condition 10.12 The construction–integration model 10.13 Forgetting functions for situation, proposition and surface information over a 4-day period 10.14 The RI-Val model showing the effects on comprehension of resonance, integration and validation over time 11.1 Brain areas activated during speech comprehension and production 11.2 Correlations between aphasic patients’ speech-production abilities and their ability to detect their own speech-production errors 11.3 Speech-production processes for picture naming, with median peak activation times 11.4 Speech-production processes: the timing of activation associated with different cognitive functions 11.5 Language-related regions and their connections in the left hemisphere 11.6 Semantic and syntactic errors made by: healthy controls and patients with no damage to the dorsal or ventral pathway, damage to the ventral pathway only, damage to the dorsal pathway only and damage to both pathways 11.7 A sample array with six different garments coloured blue or green 11.8 Architecture of the forward modelling approach to explaining audience design effects 11.9 Hayes’ (2012) writing model: (1) control level; (2) writing process level; and (3) resource level 11.10 The frequency of three major writing processes (planning, translating and revising) across the three phases of writing 11.11 Kellogg’s three-stage theory of the development of writing skill 11.12 Brain areas activated during handwriting tasks 11.13 The cognitive architectures for (a) reading and (b) spelling 11.14 Brain areas in the left hemisphere associated with reading, letter perception and writing 12.1 Explanation of the solution to the Monty Hall problem 12.2 Brain areas involved in (a) mathematical problem solving; (b) verbal problem solving; (c) visuo-spatial problem solving; and (d) areas common to all three problem types (conjunction) 12.3 The mutilated draughtboard problem 12.4 Flow chart of insight problem solving 12.5 (a) The nine-dot problem and (b) its solution 12.6 Two of the matchstick problems used by Knoblich et al. (1999) with cumulative solution rates 12.7 The multiplying billiard balls trick 12.8 The two-string problem 12.9 Some of the materials for participants instructed to mount a candle on a vertical wall in Duncker’s (1945) study

9781138482210_COGNITIVE_PSYCHOLOGY_PRE_CHAP_1.indd 24

496 502 503 506 517 524 532 534 536

540 544 546 552 553 554 559 560 563 575 577 577 580 580 581 582 583 585

28/02/20 2:15 PM

Illustrations

xxv

12.10 Mean percentages of correct solutions as a function of problem type and working memory capacity 587 12.11 The initial state of the five-disc version of the Tower of Hanoi problem 588 12.12 Tower of London task (two-move and five-move problems) 590 12.13 A problem resembling those used on the Raven’s Progressive Matrices 594 12.14 Relational reasoning: the probabilities of successful encoding, inferring, mapping and applying for lower and high performers 597 12.15 Major processes involved in performance of numerous cognitive tasks 598 12.16 Summary of key brain regions and their associated functions in relational reasoning based on patient and neuroimaging studies 599 12.17 Mean strength of the first-mentioned chess move and the move chosen as a function of problem difficulty by experts and by tournament players 603 12.18 A theoretical framework of the main cognitive processes and potential errors in medical decision-making 605 12.19 Eye fixations of a pathologist given the same biopsy whole-slide image (a) starting in year 1 and (d) ending in year 4 606 12.20 Brain activation while diagnosing lesions in X-rays, naming animals and naming letters 608 12.21 Brain image showing areas in the primary motor cortex with differences in relative voxel size between trained children and non-trained controls: (a) changes in relative voxel size over time; (b) correlation between improvement in motor-test performance and change in relative voxel size 611 12.22 Brain image showing areas in the primary auditory area with differences in relative voxel size between trained children and non-trained controls: (a) changes in relative voxel size over time; (b) correlation between improvement in a melody-rhythm test and change in relative voxel size 612 12.23 Mean chess ratings of candidates, non-candidate grandmasters and all non-grandmasters as a function of number of games played 616 12.24 The main factors (genetic and environmental) influencing the development of expertise 617 13.1 Percentages of correct responses and various incorrect responses with the false-positive and benign cyst scenarios 627 13.2 Percentage of correct predictions of the judged frequencies of different causes of death based on the affect heuristic (overall dread score), affect heuristic and availability 628 13.3 Percentage of correct inferences on four tasks 632 13.4 A hypothetical value function 642 13.5 Ratings of competence satisfaction for the sunk-cost option and the alternative option for those selecting each option 644

9781138482210_COGNITIVE_PSYCHOLOGY_PRE_CHAP_1.indd 25

28/02/20 2:15 PM

xxvi

Illustrations

13.6 13.7

13.8 13.9 13.10 13.11 14.1 14.2 14.3 14.4 14.5 14.6 14.7 14.8

14.9 14.10 14.11 14.12 15.1 15.2 15.3 15.4

Risk aversion for gains and risk seeking for losses on a money-based task by financial professionals and students Percentages of participants adhering to cumulative prospect theory, the minimax rule, or unclassified with affect-poor and affect-rich problems (a) with or (b) without numerical information concerning willingness to pay for medication Proportion of politicians and population samples in Belgium, Canada and Israel voting to extend a loan programme A model of selective exposure: defence motivation and accuracy motivation The five phases of decision-making according to Galotti’s theory Klein’s recognition-primed decision model Mean number of modus ponens inferences accepted as a function of relative strength of the evidence and strategy The Wason selection task Percentage acceptance of conclusions as a function of perceived base rate (low vs high), believability of conclusions and validity of conclusions Three models of the relationship between the intuitive and deliberate systems: (a) serial model; (b) parallel model; and (c) logical intuition model Proportion correct on incongruent syllogisms as a function of instructions and cognitive ability The approximate time courses of reasoning and metareasoning processes during reasoning and problem solving Brain regions most consistently activated across 28 studies of deductive reasoning Relationships between reasoning task performance (accuracy) and inferior frontal cortex activity in the left hemisphere and the right hemisphere in (a) the low-load condition and (b) the high-load condition Mean responses to the question, “How much risk do you believe climate change poses to human health, safety or prosperity?” Effects of trustworthiness and others’ opinions on convincingness ratings Mean-rated argument strength as a function of the probability of the outcome and how negative the outcome would be Stanovich’s tripartite model of reasoning The two-dimensional framework for emotion showing the two dimensions of pleasure–misery and arousal–sleep and the two dimensions of positive affect and negative affect Brain areas activated by positive, negative and neutral stimuli Brain areas showing greater activity for top-down than for bottom-up processing and those showing greater activity for bottom-up than for top-down processes Multiple appraisal mechanisms used in emotion generation

9781138482210_COGNITIVE_PSYCHOLOGY_PRE_CHAP_1.indd 26

645

650 654 659 660 661 676 676 679 685 687 689 690

692 696 700 701 706 716 717 718 720

28/02/20 2:15 PM

Illustrations

15.5 15.6

15.7 15.8 15.9

15.10

15.11 15.12 15.13 15.14 15.15 15.16 15.17 15.18 15.19 15.20 15.21 15.22 15.23 15.24 16.1 16.2

Changes in self-reported horror and distress and in galvanic skin response between pre-training and post-training (for the watch condition and the appraisal condition) A process model of emotion regulation based on five major types of strategy (situation selection, situation modification, attention deployment, cognitive change and response modulation) Mean level of depression as a function of stress severity and cognitive reappraisal ability A three-stage neural network model of emotion regulation The incompatibility flanker effect (incompatible trials – compatible trials) on reaction times as a function of mood (happy or sad) and whether a global, local or mixed focus had been primed on a previous task Two main brain mechanisms involved in the memoryenhancing effects of emotion: (1) the medial temporal lobes; (2) the medial, dorsolateral and ventrolateral prefrontal cortex (a) Free and (b) cued recall as a function of mood state (happy or sad) at learning and at recall Two well-known moral dilemma problems: (a) the trolley problem; and (b) the footbridge problem The dorsolateral prefrontal cortex, located approximately in Brodmann areas 9 and 46 and the ventromedial prefrontal cortex located approximately in Brodmann areas 10 and 11 Sensitivity to consequences, sensitivity to moral norms and preference for inaction vs action as a function of psychopathy (low vs high) Driverless cars: moral decisions Effects of mood manipulation (anxiety, sadness or neutral) on percentages of people choosing a high-risk job option Mean buying price for a water bottle as a function of mood (neutral vs sad) and self-focus (low vs high) The positive emotion “family tree” with the trunk representing the neural reward system and the branches representing nine semi-distinct positive emotions Probability of selecting a candy bar by participants in a happy or sad mood as a function of implicit attitudes on the Implicit Association Test Effects of mood states on judgement and decision-making. The emotion-imbued choice model The dot-probe task The emotional Stroop task The impaired cognitive control account put forward by Joormann et al. (2007) Mean scores for error detection on a proofreading task comparing unconscious goal vs no-goal control and low vs. high goal importance Awareness as a social perceptual model of attention

9781138482210_COGNITIVE_PSYCHOLOGY_PRE_CHAP_1.indd 27

xxvii

721

725 727 728

733

735 737 738 739 741 742 745 746 748 750 750 752 756 756 761 770 771

28/02/20 2:15 PM

xxviii

Illustrations

16.3 16.4 16.5 16.6 16.7 16.8

16.9 16.10 16.11

16.12

16.13 16.14

(a) Region in left fronto-polar cortex for which decoding of upcoming motor decisions was possible. (b) Decoding accuracy of these decisions Undistorted and distorted photographs of the Brunnen der Lebensfreude in Rostock, Germany Modulation of the appropriate frequency bands of the EEG signal associated with motor imagery in one healthy control and three patients Activation patterns on a binocular-rivalry task when observers (A) reported what they perceived or (B) passively experienced rivalry Three successive stages of visual processing following stimulus presentation Percentage of trials on which participants reported awareness of the content of photographs under masked and unmasked conditions for animal and non-animal photographs Five hypotheses about the relationship between attention and conscious awareness identified by Webb and Graziano Event-related potential waveforms in the aware-correct, unaware-correct and unaware-incorrect conditions Synchronisation of neural activity across cortical areas for consciously perceived words (visible condition) and nonperceived words (invisible condition) during different time periods Integrated brain activity: (a) overall information sharing or integration across the brain for vegetative state, minimally conscious and conscious brain-damaged patients and healthy controls); (b) information sharing (integration) across short, medium and long distances within the brain for the four groups Event-related potentials in the left and right hemispheres to the first of two stimuli by AC (a patient with severe corpus callosum damage) Detection and localisation of circles presented to the left or right visual fields by two patients responding verbally, with the left or right hand

9781138482210_COGNITIVE_PSYCHOLOGY_PRE_CHAP_1.indd 28

774 777 779 781 782

783 785 786

787

788 796 797

28/02/20 2:15 PM

Preface Producing regular editions of this textbook gives us a front-row seat from which to observe all the exciting developments in our understanding of human cognition. What are the main reasons for the rapid rate of progress within cognitive psychology since the seventh edition of this textbook? Below we identify two factors that have been especially important. First, the overarching assumption that the optimal way to enhance our understanding of cognition is by combining data and insights from several different approaches remains exceptionally fruitful. These approaches include traditional cognitive psychology; cognitive neuropsychology (study of brain-damaged patients); computational cognitive science (development of computational models of human cognition); and cognitive neuroscience (combining information from behaviour and from brain activity). Note that we use the term “cognitive psychology” in a broad or general sense to cover all these approaches. The above approaches all continue to make extremely valuable contributions. However, cognitive neuroscience deserves to be singled out  – it has increasingly been used with great success to resolve theoretical controversies and to provide novel empirical data that foster theoretical developments. Second, there has been a steady increase in cognitive research of direct relevance to real life. This is reflected in a substantial increase in the number of boxes labelled “in the real world” in this edition compared to the previous one. Examples include eyewitness confidence, mishearing of song lyrics, multi-tasking, airport security checks and causes of plane crashes. What is noteworthy is the increased quality of real-world research (e.g., more sophisticated experimental designs; enhanced theoretical relevance). With every successive edition of this textbook, the authors have had to work harder and harder to keep with huge increase in the number of research publications in cognitive psychology. For example, the first author wrote parts of the book in far-flung places including Botswana, New Zealand, Malaysia and Cambodia. His only regret is that book writing has sometimes had to take precedence over sightseeing! We would both like to thank the very friendly and efficient staff at Psychology Press including Sadé Lee and Ceri McLardy.

9781138482210_COGNITIVE_PSYCHOLOGY_PRE_CHAP_1.indd 29

28/02/20 2:15 PM

xxx

Preface

We would also like to thank the anonymous reviewers, that commented on various chapters. Their comments were very useful when we embarked on the task of revising the first draft of the manuscript. Of course, we are responsible for any errors and/or misunderstandings that remain. Michael Eysenck and Mark Keane

9781138482210_COGNITIVE_PSYCHOLOGY_PRE_CHAP_1.indd 30

28/02/20 2:15 PM

Visual tour (how to use this book) TEXTBOOK FEATURES Listed below are the various pedagogical features that can be found both in the margins and within the main text, with visual examples of the boxes to look out for, and descriptions of what you can expect them to contain.

Key terms Throughout the book, key terms are highlighted in the text and defined in boxes in the margins, helping you to get to grips with the vocabulary fundamental to the subject being covered.

In the real world Each chapter contains boxes within the main text that explore “real world” examples, providing context and demonstrating how some of the theories and concepts covered in the chapter work in practice.

Chapter summary Each chapter concludes with a brief summary of each section of the chapter, helping you to consolidate your learning by making sure you have taken in all of the concepts covered.

Further reading Also at the end of each chapter is an annotated list of key scholarly books, book chapters, and journal articles that it is recommended you explore through independent study to expand upon the knowledge you have gained from the chapter and plan for your assignments.

9781138482210_COGNITIVE_PSYCHOLOGY_PRE_CHAP_1.indd 31

28/02/20 2:15 PM

xxxii

Visual tour (how to use this book)

Links to companion website features Whenever you see this symbol, look out for related supplementary material amongst the resources for that chapter on the companion website at www. routledge.com/cw/eysenck.

Glossary An extensive glossary appears at the end of the book, offering a comprehensive list that includes all the key terms boxes in the main text.

9781138482210_COGNITIVE_PSYCHOLOGY_PRE_CHAP_1.indd 32

28/02/20 2:15 PM

Chapter

Approaches to human cognition

1

INTRODUCTION We are now well into the third millennium and there is ever-increasing interest in unravelling the mysteries of the human brain and mind. This interest is reflected in the substantial upsurge of scientific research within cognitive psychology and cognitive neuroscience. In addition, the cognitive approach has become increasingly influential within clinical psychology. In that area, it is recognised that cognitive processes (especially cognitive biases) play a major role in the development (and successful treatment) of mental disorders (see Chapter 15). In similar fashion, social psychologists increasingly focus on social cognition. This focuses on the role of cognitive processes in influencing individuals’ behaviour in social situations. For example, suppose other people respond with laughter when you tell them a joke. This laughter is often ambiguous  – they may be laughing with you or at you (Walsh et al., 2015). Your subsequent behaviour is likely to be influenced by your cognitive interpretation of their laughter. What is cognitive psychology? It is concerned with the internal processes involved in making sense of the environment and deciding on appropriate action. These processes include attention, perception, learning, memory, language, problem solving, reasoning and thinking. We can define cognitive psychology as aiming to understand human cognition by observing the behaviour of people performing various cognitive tasks. However, the term “cognitive psychology” can also be used more broadly to include brain activity and structure as relevant information for understanding human cognition. It is in this broader sense that it is used in the title of this book. Here is a simple example of cognitive psychology in action. Frederick (2005) developed a test (the Cognitive Reflection Test) that included the following item: A bat and a ball cost $1.10 in total. The bat costs $1.00 more than the ball. How much does the ball cost? ___ cents

9781138482210_COGNITIVE_PSYCHOLOGY_PRE_CHAP_1.indd 1

KEY TERMS Social cognition An approach within social psychology in which the emphasis is on the cognitive processing of information about other people and social situations. Cognitive psychology An approach that aims to understand human cognition by the study of behaviour; a broader definition also includes the study of brain activity and structure.

28/02/20 2:15 PM

2

KEY TERM Cognitive neuroscience An approach that aims to understand human cognition by combining information from behaviour and the brain.

Approaches to human cognition

What do you think is the correct answer? Braňas-Garza et al. (2015) found in a review of findings from 41,004 individuals that 68% produced the wrong answer (typically 10 cents) and only 32% gave the right answer (5 cents). Even providing financial incentives to produce the correct answer failed to improve performance. The above findings suggest most people will rapidly produce an incorrect answer (i.e., 10 cents) that is easily accessible and are unwilling to devote extra time to checking that they have the right answer. However, Gangemi et al. (2015) found many individuals producing the wrong answer had a feeling of error suggesting they experienced cognitive uneasiness about their answer. In sum, the intriguing findings on the Cognitive Reflection Test indicate that we can fail to think effectively even on relatively simple problems. Subsequent research has clarified the reasons for these deficiencies in our thinking (see Chapter 12). The aims of cognitive neuroscientists overlap with those of cognitive psychologists. However, there is one major difference between cognitive neuroscience and cognitive psychology in the narrow sense. Cognitive neuroscientists argue convincingly we need to study the brain as well as behaviour while people engage in cognitive tasks. After all, the internal processes involved in human cognition occur in the brain. Cognitive neuroscience uses information about behaviour and the brain to understand human cognition. Thus, the distinction between cognitive neuroscience and cognitive psychology in the broader sense is blurred. Cognitive neuroscientists explore human cognition in several ways. First, there are brain-imaging techniques of which functional magnetic resonance imaging (fMRI) is probably the best-known. Second, there are electrophysiological techniques involving the recording of electrical signals generated by the brain. Third, many cognitive neuroscientists study the effects of brain damage on cognition. It is assumed the patterns of cognitive impairment shown by brain-damaged patients can inform us about normal cognitive functioning and the brain areas responsible for various cognitive processes. The huge increase in scientific interest in the workings of the brain is mirrored in the popular media – numerous books, films and television programmes communicate the more accessible and dramatic aspects of cognitive neuroscience. Increasingly, media coverage includes coloured pictures of the brain indicating the areas most activated when people perform various tasks.

Four main approaches We can identify four main approaches to human cognition (see Table 1.1). Note, however, there has been a substantial increase in research combining two (or even more) of these approaches. We will shortly discuss each approach in turn and you will probably find it useful to refer back to this chapter when reading the rest of the book. Hopefully, you will find Table 1.3 (towards the end of this chapter) especially useful because it summarises the strengths and limitation of all four approaches.

9781138482210_COGNITIVE_PSYCHOLOGY_PRE_CHAP_1.indd 2

28/02/20 2:15 PM

3

Approaches to human cognition

TABLE 1.1 APPROACHES TO HUMAN COGNITION 1.

Cognitive psychology: this approach involves using behavioural evidence to enhance our understanding of human cognition. Since behavioural data are also of great importance within cognitive neuroscience and cognitive neuropsychology, cognitive psychology’s influence is enormous.

2.

Cognitive neuropsychology: this approach involves studying brain-damaged patients to understand normal human cognition. It was originally closely linked to cognitive psychology but has recently also become linked to cognitive neuroscience.

3.

Cognitive neuroscience: this approach involves using evidence from behaviour and the brain to understand human cognition.

4.

Computational cognitive science: this approach involves developing computational models to further our understanding of human cognition; such models increasingly incorporate knowledge of behaviour and the brain. A computational model takes the form of an algorithm, which consists of a precise and detailed specification of the steps involved in performing a task. Computational models are designed to simulate or imitate human processing on a given task.

KEY TERM Algorithm A computational procedure providing a specified set of steps to problem solution; see heuristic.

COGNITIVE PSYCHOLOGY We can obtain some perspective on the contribution of cognitive psychology by considering what preceded it. Behaviourism was the dominant approach to psychology throughout the first half of the twentieth century. The American psychologist John Watson (1878–1958) is often regarded as the founder of behaviourism. He argued that psychologists should focus on stimuli (aspects of the immediate situation) and responses (behaviour  produced by the participants in an experiment). This approach appears “scientific” because it focuses on stimuli and responses, both of which are observable. Behaviourists argued that internal mental processes (e.g., attention) cannot be verified by reference to observable behaviour and so should be ignored. According to Watson (1913, p. 165), behaviourism should “never use the terms consciousness, mental states, mind, content, introspectively verifiable and the like”. In stark contrast, as we have already seen, cognitive psychologists argue it is of crucial importance to study such internal mental processes. Hopefully, you will be convinced that cognitive psychologists are correct when you read how the concepts of attention (Chapter 5) and consciousness (Chapter 16) have been used fruitfully to enhance our understanding of human cognition. It is often claimed that behaviourism was overthrown by the “cognitive revolution”. However, the reality was less dramatic (Hobbs & Burman, 2009). For example, Tolman (1948) was a behaviourist but he did not believe internal processes should be ignored. He carried out studies in which rats learned to run through a maze to a goal box containing food. When Tolman blocked off the path the rats had learned to use, they rapidly learned to follow other paths leading in the right general direction. Tolman concluded the rats had acquired an internal cognitive map indicating the maze’s approximate layout. It is almost as pointless to ask “When did cognitive psychology start?”, as to enquire “How long is a piece of string?”. However, 1956 was crucially

9781138482210_COGNITIVE_PSYCHOLOGY_PRE_CHAP_1.indd 3

28/02/20 2:15 PM

4

Approaches to human cognition

important. At a meeting at the Massachusetts Institute of Technology, Noam Chomsky presented his theory of language, George Miller discussed Bottom-up processing the magic number seven in short-term memory (Miller, 1956) and Alan Processing directly Newell and Herbert Simon discussed the General Problem Solver (see influenced by environmental stimuli; see Gobet and Lane, 2015). In addition, there was the first systematic attempt top-down processing. to study concept formation from the cognitive perspective (Bruner et al., 1956). The history of cognitive psychology from the perspective of its Serial processing Processing in which one classic studies is discussed in Eysenck and Groome (2015a). process is completed Several decades ago, most cognitive psychologists subscribed to the before the next one starts; information-processing approach based loosely on an analogy between see parallel processing. the mind and the computer (see Figure 1.1). A stimulus (e.g., a problem Top-down processing or task) is presented, which causes various internal processes to occur, Stimulus processing that leading eventually to the desired response or answer. Processing directly is influenced by factors affected by the stimulus input is often described as bottom-up processing. such as the individual’s It was typically assumed only one process occurs at a time: this is serial past experience and expectations. processing, meaning the current process is completed before the onset of the next one. The above approach is drastically oversimplified. Task processing typically also involves top-down processing, which is processing influenced by the individual’s expectations and knowledge rather than simply by the stimulus itself. Read what it says in the triangle (Figure 1.2). Unless you know the trick, you probably read it as “Paris in the spring”. If so, look again: the word “the” is repeated. Your expectation it was a wellknown phrase (i.e., top-down processing) dominated the information available from the stimulus (i.e., bottom-up processing). The traditional approach was also oversimplified in assuming processing is typically serial. In fact, more than one process typically occurs at the same time – this is parallel processing. We are much more likely to use parallel processing when performing a highly Figure 1.1 practised task than a new one (see Chapter An early version of the information processing approach. 5). For example, someone taking their first driving lesson finds it very hard to control the car’s speed, steer accurately and pay attention to other road users at the same time. In contrast, an experienced driver finds it easy. There is also cascade processing: a form of parallel processing involving an overlap of different processing stages when someone performs a task. More specifically, later stages of processing are initiated before one or more earlier stages have finished. For example, suppose you are trying to work out Figure 1.2 Diagram to demonstrate top–down processing. the meaning of a visually presented word.

KEY TERMS

9781138482210_COGNITIVE_PSYCHOLOGY_PRE_CHAP_1.indd 4

28/02/20 2:15 PM

5

Approaches to human cognition

The most thorough approach would involve identifying all the letters in the word followed by matching the resultant letter string against words you have stored in long-term memory. In fact, people often engage in cascade processing – they form hypotheses as to the word that has been presented before identifying all the letters (McClelland, 1979). An important issue for cognitive psychologists is the task-impurity problem – most cognitive tasks require several processes thus making it hard to interpret the findings. One approach to this problem is to consider various tasks all requiring the same process. For example, Miyake et al. (2000) used three tasks requiring deliberate inhibition of a dominant response: (1)

(2)

(3)

KEY TERMS Parallel processing Processing in which two or more cognitive processes occur at the same time. Cascade processing Later processing stages start before earlier processing stages have been completed when performing a task.

The Stroop task: name the colour in which colour words are presented (e.g., RED printed in green) and avoid saying the colour word (which has to be inhibited). You can see for yourself how hard this task is by naming the colours of the words shown in Figure 1.3. The anti-cascade task: inhibit the natural tendency to look at a visual cue and instead look in the opposite direction. People typically take longer to perform this task than the control task of simply looking at the visual cue. The stop-signal task: respond rapidly to indicate whether each of a series of words is an animal or non-animal; on key trials, there was a computer-emitted tone indicating that the response should be inhibited.

Miyake et al. (2000) found all three tasks involved similar processes. They used complex statistical techniques (latent variable analysis) to extract what

Figure 1.3 Test yourself by naming the colours in each column. You should name the colours rapidly in the first three columns because there is no colour-word conflict. In contrast, colour naming should be slower (and more prone to error) when naming colours in the fourth and fifth columns.

9781138482210_COGNITIVE_PSYCHOLOGY_PRE_CHAP_1.indd 5

28/02/20 2:15 PM

6

KEY TERMS Ecological validity The applicability (or otherwise) of the findings of laboratory studies to everyday settings. Implacable experimenter The situation in experimental research in which the experimenter’s behaviour is uninfluenced by the participant’s behaviour.

Approaches to human cognition

was common across the three tasks. This was assumed to represent a relatively pure measure of the inhibitory process. Throughout this book, we will discuss many ingenious strategies used by cognitive psychologists to identify the processes used in numerous tasks.

Strengths Cognitive psychology was for many years the engine room of progress in understanding human cognition and the other three approaches listed in Table 1.1 have benefitted from it. For example, cognitive neuropsychology became important 25 years after cognitive psychology. It was only when cognitive psychologists had developed reasonable accounts of healthy human cognition that the performance of brain-damaged patients could be understood fully. Before that, it was hard to decide which patterns of cognitive impairment were theoretically important. In a similar fashion, the computational modelling activities of computational cognitive scientists are typically heavily influenced by precomputational psychological theories. Finally, the great majority of theories driving research in cognitive neuroscience originated within cognitive psychology. Cognitive psychology has not only had a massive influence on theorising across all four major approaches to human cognition. It has also had a predominant influence on the development of cognitive tasks and on task analysis (how a task is accomplished).

Limitations In spite of cognitive psychology’s enormous contributions, it has several limitations. First, our behaviour in the laboratory may differ from our behaviour in everyday life. Thus, laboratory research sometimes lacks ecological validity – the extent to which laboratory findings are applicable to everyday life. For example, our everyday behaviour is often designed to change a situation or to influence others’ behaviour. In contrast, the sequence of events in most laboratory research is based on the experimenter’s predetermined plan and is uninfluenced by participants’ behaviour. Wachtel (1973) used the term implacable experimenter to describe this state of affairs. We must not exaggerate problems associated with lack of ecological validity. As we will see in this book, there has been a dramatic increase in applied cognitive psychology in which the emphasis is on investigating topics of general importance. Such research often has good ecological validity. Note that it is far better to carry out well-controlled experiments under laboratory conditions than poorly controlled experiments under naturalistic conditions. It is precisely because it is considerably easier for researchers to exercise experimental control in the laboratory that so much research is laboratory-based. Second, theories in cognitive psychology are often expressed only in verbal terms (although this is becoming less common). Such theories are vague, making it hard to know precisely what predictions follow from them and thus to falsify them. These limitations can largely be overcome by

9781138482210_COGNITIVE_PSYCHOLOGY_PRE_CHAP_1.indd 6

28/02/20 2:15 PM

7

Approaches to human cognition

computational cognitive scientists developing cognitive models specifying precisely any given theory’s assumptions. Third, difficulties in falsifying theories have led to a proliferation of different theories on any given topic. For example, there are at least 12 different theories of working memory (see Chapter 6). Another reason for the proliferation of rather similar theories is the “toothbrush problem” (Mischel, 2008): no self-respecting cognitive psychologist wants to use anyone else’s theory. Fourth, the findings obtained using any given task or paradigm are sometimes specific to that paradigm and do not generalise to other (apparently similar) tasks. This is paradigm specificity. It means some findings are narrow in scope and applicability (Meiser, 2011). This problem can be minimised by developing theories accounting for performance across several tasks or paradigms. For example, Anderson et al. (2004; discussed later in this chapter) developed a comprehensive theoretical architecture or framework known as the Adaptive Control of Thought-Rational (ACT-R) model. Fifth, cognitive psychologists typically obtain measures of performance speed and accuracy. These measures are very useful but provide only indirect evidence about internal cognitive processes. Most tasks are “impure” in that they involve several processes, and it is hard to identify the number and nature of processes involved on the basis of speed and accuracy measures.

KEY TERMS Paradigm specificity The findings with a given experimental task or paradigm are not replicated even when apparently very similar tasks or paradigms are used. lesion Damage within the brain resulting from injury or disease; it typically affects a restricted area.

COGNITIVE NEUROPSYCHOLOGY Cognitive neuropsychology focuses on the patterns of cognitive performance (intact and impaired) of brain-damaged patients having a lesion (structural damage to the brain caused by injury or disease). According to cognitive neuropsychologists, studying brain-damaged patients can tell us much about cognition in healthy individuals. The above idea does not sound very promising, does it? In fact, however, cognitive neuropsychology has contributed substantially to our understanding of healthy human cognition. For example, in the 1960s, most memory researchers thought the storage of information in longterm memory depended on previous processing in short-term memory (see Chapter 6). However, Shallice and Warrington (1970) reported the case of a brain-damaged man, KF. His short-term memory was severely impaired but his long-term memory was intact. These findings played an important role in changing theories of healthy human memory. Since cognitive neuropsychologists study brain-damaged patients, we might imagine they would be interested in the workings of the brain. In fact, many cognitive neuropsychologists pay little attention to the brain itself. According to Coltheart (2015, p. 198), for example, “Even though cognitive neuropsychologists typically study people with brain damage, . . . cognitive neuropsychology is not about the brain: it is about information-processing models of cognition.” An increasing number of cognitive neuropsychologists disagree with Coltheart. They believe we should consider the brain, using techniques such as magnetic resonance imaging to identify the brain areas damaged in any given patient. They are also increasingly willing to study the impact of brain damage on brain processes using various neuroimaging techniques.

9781138482210_COGNITIVE_PSYCHOLOGY_PRE_CHAP_1.indd 7

28/02/20 2:15 PM

8

Approaches to human cognition

Theoretical assumptions

Max Coltheart. Courtesy of Max Coltheart.

KEY TERM Modularity The assumption that the cognitive system consists of many fairly independent or separate modules or processors, each specialised for a given type of processing.

Coltheart (2001) provided a very clear account of the major assumptions of cognitive neuropsychology. Here we will discuss these assumptions and briefly consider relevant evidence. One key assumption is modularity, meaning the cognitive system consists of numerous modules or processors operating fairly independently or separately of each other. It is assumed these modules exhibit domain specificity (they respond to only one given class of stimuli). For example, there may be a face-recognition module that responds only when a face is presented. Modular systems typically involve serial processing with processing within one module being completed before processing starts in the next module. As a result, there is very limited interaction among modules. There is some support for modularity from the evolutionary approach. Species with larger brains generally have more specialised brain regions that could be involved in modular processing. However, the notion that human cognition is heavily modular is hard to reconcile with neuroimaging evidence. The human brain possesses a moderately high level of connectivity (Bullmore & Sporns, 2012; see p. 14), suggesting there is more parallel processing than assumed by most cognitive neuropsychologists. The second major assumption is that of anatomical modularity. According to this assumption, each module is located in a specific brain area. Why is this assumption important? Cognitive neuropsychologists are most likely to make progress when studying brain patients with brain damage limited to a single module. Such patients may not exist if there is no anatomical modularity. Suppose all modules were distributed across large brain areas. If so, the great majority of brain-damaged patients would suffer damage to most modules, making it impossible to work out the number and nature of their modules. There is evidence of anatomical modularity in the visual processing system (see Chapter 2). However, there is less support for anatomical modularity with most complex tasks. For example, consider the findings of Yarkoni et al. (2011). Across over 3,000 neuroimaging studies, some brain areas (e.g., dorsolateral prefrontal cortex; anterior cingulate cortex) were activated in 20% of them despite the great diversity of tasks involved. The third major assumption (the “universality assumption”) is that “Individuals . . . share a similar or an equivalent organisation of their cognitive functions, and presumably have the same underlying brain anatomy”

9781138482210_COGNITIVE_PSYCHOLOGY_PRE_CHAP_1.indd 8

28/02/20 2:15 PM

9

Approaches to human cognition

(de Schotten and Shallice, 2017, p. 172). If this assumption (also common within cognitive neuroscience) is false, we could not readily use the findings from individual patients to draw conclusions about the organisation of other people’s cognitive systems or functional architecture. There is accumulating evidence against the universality assumption. Tzourio-Mazoyer et al. (2004) discovered substantial differences between individuals in the location of brain networks involved in speech and language. Finn et al. (2015) found clear-cut differences between individuals in functional connectivity across the brain, concluding that “An individual’s functional brain connectivity profile is both unique and reliable, similarly to a fingerprint” (p. 1669). Duffau (2017) reviewed interesting research conducted on patients during surgery for epilepsy or a tumour. Direct electrical stimulation, which causes “a genuine virtual transient lesion” (p. 305) is applied invasively to the cortex. The patient is awakened and given various cognitive tasks while receiving stimulation. Impaired performance when direct electrical stimulation is applied to a given area indicates that area is involved in the cognitive functions assessed by the current task. Findings obtained using direct electrical stimulation and other techniques (e.g., fMRI) led Duffau (2017) to propose a two-level model. At the cortical level, there is high variability across individuals in structure and function of any given brain areas. At the subcortical level (e.g., in premotor cortex), in contrast, there is very little variability across individuals. The findings at the cortical level seem inconsistent with the universality assumption. The fourth assumption is subtractivity. The basic idea is that brain damage impairs one or more processing modules but does not change or add anything. The fifth assumption (related to subtractivity) is transparency (Shallice, 2015). According to the transparency assumption, the performance of a brain-damaged patient reflects the operation of a theory designed to explain the performance of healthy individuals minus the impact of their lesion. Why are the subtractivity and transparency assumptions important? Suppose they are incorrect and brain-damaged patients develop new modules to compensate for their cognitive impairments. That would greatly complicate the task of learning about the intact cognitive system by studying brain-damaged patients. Consider pure alexia, a condition in which brain-damaged patients have severe reading problems but otherwise intact language abilities. These patients generally have a direct relationship between word length and reading speed due to letter-by-letter processing (Bormann et al., 2015). This indicates the use of a compensatory strategy differing markedly from the reading processes used by healthy adults.

KEY TERM Pure alexia Severe problems with reading but not other language skills; caused by damage to brain areas involved in visual processing.

Research in cognitive neuropsychology How do cognitive neuropsychologists set about understanding the cognitive system? Of major importance is the search for dissociations, which occur when a patient has normal performance on one task (task X) but is impaired on a second one (task Y). For example, amnesic patients perform almost normally on short-term memory tasks but are greatly impaired on many

9781138482210_COGNITIVE_PSYCHOLOGY_PRE_CHAP_1.indd 9

28/02/20 2:15 PM

10

KEY TERMS Double dissociation The finding that some brain-damaged individuals have intact performance on one task but poor performance on another task whereas other individuals exhibit the opposite pattern. Association The finding that certain symptoms or performance impairments are consistently found together in numerous brain-damaged patients. Syndrome The notion that symptoms that often co-occur have a common origin. Case-series study A study in which several patients with similar cognitive impairments are tested; this allows consideration of individual data and of variation across individuals.

Approaches to human cognition

long-term memory tasks (see Chapter 6). It is tempting (but dangerous!) to conclude that the two tasks involve different processing modules and that the module(s) needed on long-term memory tasks have been damaged by brain injury. Why must we avoid drawing sweeping conclusions from dissociations? Patients may perform well on one task but poorly on a second one simply because the second task is more complex. Thus, dissociations may reflect differences in task complexity rather than the use of different modules. One apparent solution to the above problem is to find double dissociations. A double dissociation between two tasks (X and Y) is obtained when one patient performs normally on task X and is impaired on task Y but another patient shows the opposite pattern. We cannot explain double dissociations by arguing that one task is harder. For example, consider the double dissociation that amnesic patients have impaired long-term memory but intact short-term memory whereas other patients (e.g., KF discussed above) have the opposite pattern. This double dissociation strongly suggests there is an important distinction between short-term and long-term memory and that they involve different brain regions. The approach based on double dissociations has various limitations. First, it is generally based on the assumption that separate modules exist (which may be misguided). Second, double dissociations can often be explained in various ways and so provide only indirect evidence for separate modules underlying each task (Davies, 2010). For example, a double dissociation between tasks X and Y implies the cognitive system used on X is not identical to the one used on Y. Strictly speaking, the most we can generally conclude is that “Each of the two systems has at least one sub-system that the other doesn’t have” (Bergeron, 2016, p. 818). Third, it is hard to decide which of the very numerous double dissociations that have been discovered are theoretically important. Finally, we consider associations. An association occurs when a patient is impaired on tasks X and Y. Associations are sometimes taken as evidence for a syndrome (sets of symptoms or impairments often found together). However, there is a serious flaw in the syndrome-based approach. An association may be found between tasks X and Y because the mechanisms on which they depend are adjacent anatomically in the brain rather than because they depend on the same underlying mechanism. Thus, the interpretation of associations is fraught with difficulty.

Single case studies vs case series For many years after the rise of cognitive neuropsychology in the 1970s, most cognitive neuropsychologists made extensive use of single-case studies. There were two main reasons. First, researchers can often gain access to only one patient having a given pattern of cognitive impairment. Second, it was often assumed every patient has a somewhat different pattern of cognitive impairment and so is unique. As a result, it would be misleading and uninformative to average the performance of several patients. In recent years, there has been a move towards the case-series study. Several patients with similar cognitive impairments are tested. After that,

9781138482210_COGNITIVE_PSYCHOLOGY_PRE_CHAP_1.indd 10

28/02/20 2:15 PM

11

Approaches to human cognition

the data of individual patients are compared and variation across patients assessed. The case-series approach is generally preferable to the single-case approach for various reasons (Lambon Ralph et al., 2011; Bartolomeo et  al., 2017). First, it provides much richer data. With a case series, we can assess the extent of variation between patients rather than simply being concerned about the impairment (as in the single-case approach). Second, with a case series, we can identify (and then de-emphasise) the findings from patients who are “outliers”. With the single-case approach, in contrast, we do not know whether the one and only patient is representative of patients with that condition or is an outlier.

KEY TERM Diaschisis The disruption to distant brain areas caused by a localised brain injury or lesion.

Strengths Cognitive neuropsychology has several strengths. First, it has the advantage that it allows us to draw causal inferences about the relationship between brain areas and cognitive processes and behaviour. In other words, we can conclude (with moderate but not total confidence) that a given brain area is crucially involved in performing certain cognitive tasks (Genon et al., 2018). Second, as Shallice (2015, pp. 387–388) pointed out, “A key intellectual strength of neuropsychology . . . is its ability to provide evidence falsifying plausible cognitive theories.” Consider patients reading visually presented words and non-words aloud. We might imagine patients with damage to language areas would have problems in reading all words and non-words.  However, some patients perform reasonably well when reading  regular words (with predictable pronunciations) or non-words, but poorly when reading irregular words (words with unpredictable pronunciations). Other patients can read regular words but have problems with unfamiliar words and non-words. These fascinating patterns of impairment have transformed theories of reading (Coltheart, 2015; see Chapter 9). Third, cognitive neuropsychology “produces large-magnitude phenomena which can be initially theoretically highly counterintuitive” (Shallice, 2015, p. 405). For example, amnesic patients typically have severely impaired long-term memory for personal events and experiences but an essentially intact ability to acquire and retain motor skills (Chapter 7). These strong effects played a major role in memory researchers abandoning the notion of a single long-term memory system and replacing it with more complex theories. Fourth, in recent years, cognitive neuropsychology has increasingly been combined fruitfully with cognitive neuroscience. For example, cognitive neuroscience has revealed that a given brain injury or lesion often has widespread effects within the brain. This phenomenon is known as diaschisis: “the distant neurophysiological changes directly caused by a focal injury . . . these changes should correlate with behaviour” (Carrera & Tononi, 2014, p. 2410). Discovering the true extent of the brain areas adversely affected by a lesion facilitates the task of relating brain functioning to cognitive processing and task performance.

9781138482210_COGNITIVE_PSYCHOLOGY_PRE_CHAP_1.indd 11

28/02/20 2:15 PM

12

Approaches to human cognition

Limitations

KEY TERMS Sulcus A groove or furrow in the surface of the brain. Gyrus Prominent elevated area or ridge on the brain’s surface; “gyri” is the plural. Dorsal Towards the top. Ventral Towards the bottom. Rostral Towards the front of the brain.

What are the limitations of the cognitive neuropsychological approach? First, the crucial assumption that the cognitive system is fundamentally modular is reasonable but too strong. There is less evidence for modularity among higher-level cognitive processes (e.g., consciousness; focused attention) than among lower-level processes (e.g., colour processing; motion processing). If the modularity assumption is incorrect, this has implications for the whole enterprise of cognitive neuropsychology (Patterson & Plaut, 2009). Second, other theoretical assumptions also seem too extreme. For example, evidence discussed earlier casts considerable doubts on the assumption of anatomical modularity and the universality assumption. Third, the common assumption that the task performance of patients provides relatively direct evidence concerning the impact of brain damage on previously intact cognitive systems is problematic. Brain-damaged patients often make use of compensatory strategies to reduce or eliminate the negative effects of brain damage on cognitive performance. We saw an example of such compensatory strategies earlier – patients with pure alexia manage to read words by using a letter-by-letter strategy rarely used by healthy individuals. Hartwigsen (2018) proposed a model to predict when compensatory processes will and will not be successful. According to this model, general processes (e.g., attention; cognitive control; error monitoring) can be used to compensate for the disruption of specific processes (e.g., phonological processing) by brain injury. However, specific processes cannot be used to compensate for the disruption of general processes. Hartwigsen discussed evidence supporting his model. Fourth, lesions can alter the organisation of the brain in several ways. Dramatic evidence for brain plasticity is discussed in Chapter 16. Patients whose entire left brain hemisphere was removed at an early age (known as hemispherectomy) often develop good language skills even though language is typically centred in the left hemisphere (Blackmon, 2016). There is the additional problem that a brain lesion can lead to changes in the functional connectivity between the area of the lesion and distant, intact brain areas (Bartolomeo et al., 2017). Thus, impaired cognitive performance following brain damage may reflect widespread reduced brain connectivity as well as direct damage to a specific brain area. This complicates the task of interpreting the findings obtained from brain-damaged patients.

Posterior Towards the back of the brain.

COGNITIVE NEUROSCIENCE: THE BRAIN IN ACTION

Lateral Situated at the side of the brain.

Cognitive neuroscience involves the intensive study of brain activity as well as behaviour. Alas, the brain is extremely complicated (to put it mildly!). It consists of 100 billion neurons connected in very complex ways. We must consider how the brain is organised and how the different areas are described to understand research involving functional neuroimaging. Below we discuss various ways of describing specific brain areas.

Medial Situated in the middle of the brain.

9781138482210_COGNITIVE_PSYCHOLOGY_PRE_CHAP_1.indd 12

28/02/20 2:15 PM

13

Approaches to human cognition

Figure 1.4 The four lobes, or divisions, of the cerebral cortex in the left hemisphere.

Interactive feature: Primal Pictures’ 3D atlas of the brain

First, the cerebral cortex is divided into four main divisions or lobes (see Figure 1.4). There are four lobes in each brain hemisphere: frontal; parietal; temporal; and occipital. The frontal lobes are divided from the parietal lobes by the central sulcus (sulcus means furrow or groove), and the lateral fissure separates the temporal lobes from the parietal and frontal lobes. In addition, the parietooccipital sulcus and pre-occipital notch divide the occipital lobes from the parietal and temporal lobes. The main gyri (or ridges; gyrus is the singular) within the cerebral cortex are shown in Figure 1.4. Researchers use various terms to describe accurately the brain area(s) activated during task performance: ● ● ● ● ● ●

dorsal (or superior): towards the top ventral (or inferior): towards the bottom anterior (or rostral): towards the front posterior: towards the back lateral: situated at the side medial: situated in the middle.

The German neurologist Korbinian Brodmann (1868–1918) produced a brain map based on differences in the distributions of cell types across cortical layers (see Figure 1.5).

9781138482210_COGNITIVE_PSYCHOLOGY_PRE_CHAP_1.indd 13

Figure 1.5 Brodmann brain areas on the lateral (top figure) and medial (bottom figure) surfaces.

28/02/20 2:15 PM

14

KEY TERM Connectome A comprehensive wiring diagram of neural connections within the brain.

Approaches to human cognition

He identified 52 areas. We will often refer to areas, for example, as BA17, which means Brodmann Area 17, rather than Brain Area 17! Within cognitive neuroscience, brain areas are often described with reference to their main functions. For example, Brodmann Area 17 (BA17) is commonly called the primary visual cortex because it is strongly associated with the early processing of visual stimuli.

Brain organisation In recent years, there has been considerable progress in identifying the connectome: this is a “wiring diagram” providing a complete map of the brain’s neural connections. Why is it important to identify the connectome? First, as we will see, it advances our understanding of how the brain is organised. Second, identifying the brain’s structural connections facilitates the task of understanding how it functions. More specifically, the brain’s functioning is strongly constrained by its structural connections. Third, as we will see, we can understand some individual differences in cognitive functioning with reference to individual differences in the connectome. Bullmore and Sporns (2012) used information about the connectome to address issues about brain organisation. They argued two major principles might determine its organisation. First, there is the principle of cost control: costs (e.g., use of energy and space) would be minimised if the brain consisted of limited, short-distance connections (see Figure 1.6). Second, there is the principle of efficiency (efficiency is the ability to integrate information across the brain). This can be achieved by having very numerous connections, many of which are long-distance (see Figure 1.6). These two principles are in conflict – you cannot have high efficiency at low cost. You might imagine it would be best if our brains were organised primarily on the basis of efficiency. However, this would be incredibly costly – if all 100 billion brain neurons were interconnected, the brain would need to be 12½ miles wide (Ward, 2015)! In fact, neurons mostly connect with nearby neurons and no neuron is connected to more than about 10,000 other neurons. As a result, the human brain has a near-optimal trade-off between cost and efficiency (see Figure 1.6). Thus, our brains are reasonably efficient while incurring a manageable cost.

Figure 1.6 The left panel shows a brain network low in cost efficiency; the right panel shows a brain network high in cost efficiency; the middle panel shows the actual human brain in which there is moderate efficiency at moderate cost. Nodes are shown as orange circles. From Bullmore and Sporns (2012). Reprinted with permission of Nature Reviews.

9781138482210_COGNITIVE_PSYCHOLOGY_PRE_CHAP_1.indd 14

28/02/20 2:15 PM

Approaches to human cognition

фnorm

superior frontal

rich club

1.15

non-rich club

1.10

rich club

1.05

local

insula

feeder

1.00 0.95

k >14

40% overlap; orange = >60% overlap) in patients with optic ataxia. (SPL = superior parietal lobule; SOG = superior occipital gyrus; Pc = precuneus.) From Vesia and Crawford (2012). Reprinted with permission of Springer.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 57

28/02/20 6:43 PM

58

KEY TERM Visual form agnosia A condition in which there are severe problems in shape perception (what an object is) but apparently reasonable ability to produce accurate visually guided actions.

Visual perception and attention

Third, patients with optic ataxia have some impairment in vision for perception (especially in peripheral vision). Bartolo et al. (2018) found such patients had an impaired ability on the perceptual task of deciding whether a target was reachable and they also had problems on tasks requiring vision for action. Thus, patients with optic ataxia have difficulties in combining information from the dorsal and ventral streams. Fourth, Rossetti and Pisella (2018) concluded as follows from their review: “Optic ataxia is not a visuo-motor deficit and there is no dissociation between perception and action capacities in optic ataxia” (p. 225).

Visual form agnosia

Interactive exercise: Müller-Lyer

What about patients with damage only to the ventral stream? Of relevance are some patients with visual form agnosia, a condition involving severe problems with object recognition even though visual information reaches the visual cortex (see Chapter 3). The most-studied visual form agnosic is DF, whose brain damage is in the ventral stream (James et al., 2003). For example, her activation in that stream was no greater when presented with object drawings than with scrambled line drawings. However, she showed high levels of activation in the dorsal stream when grasping for objects. Goodale et al. (1994) found DF was very poor at a visual perception task that involved distinguishing between two shapes with irregular contours. However, she grasped these shapes firmly between her thumb and index finger. Goodale et al. concluded DF “had no difficulty in placing her fingers on appropriate opposition points during grasping” (p. 604). Himmelbach et al. (2012) re-analysed DF’s performance based on data in Goodale et al. (1994). DF’s performance was substantially inferior to that of healthy controls. Similar findings were obtained when DF’s performance on other grasping and reaching tasks was compared against controls. Thus, DF had greater difficulties with visually guided action than previously believed. Rossit et al. (2018) found DF had impaired peripheral (but not central) reaching, which is the pattern associated with optic ataxia. DF also had significant impairment in the fast control of reaching movements (also associated with optic ataxia). Rossit et al. (p. 15) concluded: “We can no longer assume that DF’s dorsal visual stream is intact and that she is spared in visuo-motor control tasks, as she also presents clear signs of optic ataxia.”

Visual illusions

Figure 2.11 The Müller-Lyer illusion.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 58

There are numerous visual illusions, of which the Müller-Lyer (see Figure 2.11) is one of the most famous. The vertical line on the left looks longer than the one on the right although they are the same length. The Ebbinghaus illusion (see Figure 2.12) is also well known. The central circle surrounded by smaller circles looks smaller than a central circle of the same size surrounded by larger circles although the two central circles are the same size.

28/02/20 6:43 PM



59

Basic processes in visual perception

How has the human species flourished if our visual perceptual processes are apparently very prone to error? Milner and Goodale (1995) argued the vision-for-perception system processes visual illusions and provides visual judgements. In contrast, we mostly use the vision-for-action system when walking close to a precipice or dodging cars. These ideas led to a dramatic prediction: actions (e.g., pointing; grasping) using the vision-for-action system should be unaffected by most visual illusions.

Findings Bruno et al. (2008) conducted a meta-analytic review of Müller-Lyer studies where observers pointed rapidly at one figure (using the vision-for-action system). The mean illusion effect was 5.5%. In contrast, the mean illusion effect was 22.4% when observers provided verbal estimations of length (using the visionfor-­perception system). The perception-action model is supported by this large difference. However, the model seems to predict there should have been no illusion effect at all with pointing. With the Ebbinghaus illusion, the illusion Figure 2.12 The Ebbinghaus illusion. is often much stronger with visual judgements using the vision-for-­perception system than with grasping movements using the vision-for-action system (Whitwell & Goodale, 2017). Knol et al. (2017) explored the Ebbinghaus illusion in more detail. As predicted theoretically, only visual judgements were influenced by the distance between the target and the context. Support for the perception-action model has been reported with the hollow-face illusion, a realistic hollow mask resembling a normal face (see Figure 2.13; visit the website: www.richardgregory.org/experiments). Króliczak et al. (2006) placed a target (a small magnet) on the face mask or a normal face. Here are two tasks they used: (1) Draw the target position (using the vision-for-perception system). (2) Make a fast, flicking finger movement to the target (using the visionfor-action system). There was a strong illusion effect when observers drew the target position, whereas their performance was very accurate (i.e., illusion-free) when they made a flicking movement. Both findings were as predicted theoretically. Króliczak et al. (2006) also had a third condition where observers made a slow pointing finger movement to the target and so the vision-foraction system was involved. However, there was a fairly strong illusory effect. Why was this? Actions may involve the vision-for-perception system

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 59

KEY TERM Hollow-face illusion A concave face mask is misperceived as a normal face when viewed from several feet away.

28/02/20 6:43 PM

60

Visual perception and attention

Figure 2.13 Left: normal and hollow faces with small target magnets on the forehead and cheek of the normal face. Right: front view of the hollow mask that appears as an illusory face projecting forwards. Króliczak et al. (2006). Reprinted with permission of Elsevier.

KEY TERM Proprioception An individual’s awareness of the position and orientation of parts of their body.

as well as the vision-for-action system when preceded by conscious cognitive processes. Various problematical issues for the perception-action model have accumulated. First, the type of action is important. Franz and Gegenfurtner (2008) found the mean illusory effect with the Müller-Lyer was 11.2% with perceptual tasks, compared to 4.4% with full visual guidance of the hand movement. In contrast, grasping when observers could not monitor their hand movements was associated with an illusory effect of 9.4%, perhaps because action programming required the ventral stream. Second, illusion effects assessed by grasping movements often decrease with repeated practice (Kopiske et al., 2017). Kopiske et al. argued people use feedback from their inaccurate grasping movements on early trials to reduce illusion effects later on. Third, illusion effects are often greater when grasping or pointing movements are made following a delay (Hesse et al., 2016). The ventral stream (vision-for-perception) may be more likely to be involved after a delay. The various interpretive problems with previous research led Chen et al. (2018a) to use a different approach. In their key condition, observers had restricted vision (they viewed a sphere coated in luminescent paint in darkness through a pinhole). They estimated the sphere’s size by matching the distance between their thumb and forefinger to that size (perception) or they grasped the sphere (action). Their non-grasping hand was in their lap or directly below the sphere. In the latter condition, observers could make use of proprioception (awareness of the position of one’s body parts). Size judgements were very accurate in perception and action with full vision (see Figure 2.14). However, the key finding was that proprioceptive information about distance produced almost perfect performance when observers grasped the sphere but not when providing a perceptual estimate. These findings indicate a very clear difference in the processes underlying vision-for-perception and vision-for-action. In sum, there is some support for the predictions of the original vision-action model. However, illusory effects with visual judgements and with actions are more complex and depend on many more factors than

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 60

28/02/20 6:43 PM



61

Basic processes in visual perception

assumed by that model. Attempts by Milner and Goodale to accommodate such complexities are discussed below.

**

*** ***

***

0.4

0.0

Full

Restricted

Full

o Pr ith w o Pr no

o Pr no

–0.4

o Pr ith w o Pr no

(1) Memory is required (e.g., there is a time lag between the offset of the stimulus and the start of the grasping movement). (2) Time is available to plan the forthcoming movement (e.g., Króliczak et al., 2006). (3) Planning which movement to make is necessary. (4) The action is unpractised or awkward.

0.8

GRASPING

o Pr no

Milner and Goodale (2008) argued most tasks requiring observers to grasp an object involve some processing in the ventral stream in addition to the dorsal stream. They reviewed research showing that involvement of the ventral stream is especially likely in the following circumstances:

*** Disruption index (DI)

Action planning + motor responses

ESTIMATION

Restricted

Figure 2.14 Disruption of size judgements when estimated perceptually (estimation) or produced by grasping (grasping) in full or restricted vision when there was proprioception (withPro) or no proprioception (noPro). From Chen et al. (2018a). Reprinted with permission of Elsevier.

According to the perception-action model, actions are most likely to require the ventral stream when they involve conscious processes. Creem and Proffitt (2001) supported this notion. They started by distinguishing between effective and appropriate grasping. For example, we can grasp a toothbrush effectively by its bristles but appropriate grasping involves accessing stored knowledge about the object and so often requires the ventral stream. As predicted, appropriate grasping was much more adversely affected than effective grasping by disrupting participants’ ability to retrieve object knowledge. van Polanen and Davare (2015) reviewed research on factors controlling skilled grasping. They concluded: The ventral stream seems to be gradually more recruited as information about the object from pictorial cues or memory is needed to control the grasping movement, or if conceptual knowledge about more complex objects that are used every day or tools needs to be retrieved for allowing the most appropriate grasp. (p. 188)

Dorsal stream: conscious awareness According to the two systems approach, ventral stream processing is generally accessible to consciousness whereas dorsal stream processing is not. For example, it is assumed that the ventral stream (and conscious processing) are often involved in motor planning (Milner & Goodale, 2008). There is some support for these predictions from the model (Milner, 2012). As we will see, however, recent evidence mostly provides contrary evidence.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 61

28/02/20 6:43 PM

62

Visual perception and attention

Ludwig et al. (2016) assessed the involvement of the dorsal and ventral streams in conscious visual perception using a different approach. The visibility of visual targets presented to one eye was manipulated by varying the extent to which continuous flash suppression (rapidly changing stimuli presented to the other eye) impaired the processing of the targets. There were two main findings. First, there was a tight coupling between visual awareness of target stimuli and ventral stream processing. Second, there was a much looser coupling between target awareness and dorsal stream processing. The first finding is consistent with the two visual systems hypothesis. However, the second finding suggests dorsal processing is more relevant to conscious visual perception than assumed by that hypothesis. According to the perception-action model, manipulations (e.g., continuous flash suppression) preventing conscious perception should nevertheless permit more processing in the dorsal than the ventral stream. However, neuroimaging studies have typically obtained no evidence that neural activity in the dorsal stream is greater than in the ventral stream when observers lack conscious awareness of visual stimuli (Hesselmann et al., 2018).

Two pathways: update The perception-action model was originally proposed before neuroimaging and other techniques had clearly indicated the great complexity of the brain networks involved in perception and action (de Haan et al., 2018). Recent research has led to developments of the perception-action model in two main ways. First, we now know much more about the various interactions between processing in the dorsal and ventral streams. Second, there are more than two visual processing streams. Rossetti et al. (2017) show how theoretical conceptualisations of the relationship between visual perception and action have become more complex (see Figure 2.15). We have seen that the ventral pathway is often involved in visually guided action. There is also increasing evidence the dorsal pathway is involved in visual object recognition (Freud et al., 2016). For example, patients with damage to the ventral pathway often retain some sensitivity to three-dimensional (3-D) structural object representations (Freud et al., 2017a). Zachariou et al. (2017) applied transcranial magnetic stimulation to posterior parietal cortex within the dorsal pathway to disrupt processing. TMS disrupted the holistic processing (see Glossary) of faces, suggesting the dorsal pathway is involved in face recognition. More supporting evidence was reported by Freud et al. (2016). They studied shape processing, which is of central importance in object recognition and so should depend primarily on the ventral pathway. However, the ventral and dorsal pathways were both sensitive to shape. The observers’ ability to recognise objects correlated with the shape sensitivity of regions within the dorsal pathway. Thus, dorsal path activation was of direct relevance to shape and object processing. How many visual processing streams are there? There is evidence that actions towards objects depend on two partially separate dorsal streams (Sakreida et al., 2016; see Chapter 4). First, there is a dorso-dorsal stream (the “grasp” system) used to grasp objects rapidly. Second, there is a

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 62

28/02/20 6:43 PM



63

Basic processes in visual perception

Figure 2.15 Historical developments in theories linking perception and action. Row 1: the intuitive notion that action is preceded by conscious perception. Row 2: Goodale and Milner’s original two systems’ theory. Row 3: interaction between the two anatomical pathways and perceptual and visual processes. Row 4: evidence that processing in primary motor cortex is preceded by interconnections between dorsal (green) and ventral (red) pathways.

Row 1: VISION PERCEPTION ACTION

Do rs a

l

Row 2: ACTION VISION

V1

PERCEPTION

Ventr a

l

From Rossetti et al. (2017). Reprinted with permission of Elsevier.

Dorsal xxx

Row 3:

xxx

xxx

xxx

xxx

ACTION

xxx xxx xxx xxx

xxx xxx xxx

VISION

xxx xxxxxxxx

xx

PERCEPTION

xxx xxx

xxx

xxx

xxx

Ventral

xxx

BS

SC

Row 4:

V3d PO

ACTION VISION PERCEPTION

PIP POa, UP/IP

V1

MT V3a periphV4 V2 V4 V3v Pre-strlate

Eye

FEF. SEF

Post. Parietal

MIP 7a

PFd (46)

7b

PFv (12)

AIP PMv Frontal

FST MSSTP S.T.S TEO Inf. Temporal

PNd Cing SMA

TE

Arm M1 Hand Face

Hipp.

ventro-dorsal stream that makes use of memorised object knowledge and operates more slowly than the first stream. Haak and Beckmann (2018) investigated the connectivity patterns among 22 visual areas, discovering these areas “are organised into not two but three visual pathways: one dorsal, one lateral, and one ventral” (p. 82). Their findings thus provide some support for the emphasis within the ­ perception-action model on dorsal and ventral streams. Haak and Beckmann speculated that the new lateral pathway may “incorporate . . . aspects of vision, action and language” (p. 81).

Overall evaluation Milner and Goodale’s theoretical approach has been hugely influential. Their central assumption that there are two visual systems (“what”

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 63

28/02/20 6:43 PM

64

Visual perception and attention

and “how”) is partially correct. It has received inconsistent support from research on patients with optic ataxia and visual agnosia. Earlier we discussed achromatopsia (see Glossary) and akinetopsia (see Glossary). The former condition depends on damage to the ventral pathway and the latter condition on damage to the dorsal pathway (Haque et al., 2018). As predicted theoretically, many visual illusions are much reduced in extent when observers engage in action-based performance (e.g., pointing; grasping). What are the model’s limitations? First, evidence from brain-­damaged patients provides relatively weak support for it. In fact, “The idea of a double dissociation between optic ataxia and visual form agnosia, as cleanly separating visuo-motor from visual perceptual functions, is no longer tenable” (Rossetti et al., 2017, p. 130). Second, findings based on visual illusions provide only partial support for the model. The findings generally indicate that illusory effects are greater with perceptual judgements than actions but there are many exceptions. Third, the model exaggerates the independence of the two visual systems. For example, Janssen et al. (2018) reviewed research on 3-D object perception and found strong effects of the dorsal stream on the ventral stream. As de Haan et al. indicated, The prevailing evidence suggests that cross-talk [interactions between visual systems] is the norm rather than the exception . . . [There is] a flexible and dynamic pattern of interaction between visual processing areas in which visually processing networks may be created on-the-fly in a highly task-specific manner. (de Haan et al., 2018, p. 6) Fourth, the notion there are only two visual processing streams is an oversimplification. Earlier on pp. 62–63 we discussed two attempts (Haak & Beckmann, 2018; Sakreida et al., 2016) to develop more complete accounts.

COLOUR VISION Why do we have colour vision? After all, if you watch an old black-andwhite movie on television you can easily understand the moving images. One reason is that colour often makes an object stand out from its surroundings making it easier to identify. Chameleons very sensibly change colour to blend in with the background, thus reducing their chances of being detected by predators. Colour perception also helps us to recognise and categorise objects. For example, it is useful when deciding whether a piece of fruit is under- or overripe. Predictive coding (processing primarily aspects of sensory input that violate the observer’s predictions) is also relevant (Huang & Rao, 2011). Colour vision allows observers to focus rapidly on any aspects of the incoming visual input (e.g., discolouring) discrepant with predictions based on ripe fruit. There are three main qualities associated with colour: (1) Hue: the colour itself and what distinguishes red from yellow or blue. (2) Brightness: the perceived intensity of light.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 64

28/02/20 6:43 PM



65

Basic processes in visual perception

(3) Saturation: this allow us to determine whether a colour is vivid or pale; it is influenced by the amount of white present.

Trichromacy theory Retinal cones are specialised for colour vision. Cone receptors contain light-sensitive photopigment allowing them to respond to light. According to the trichromatic [three-coloured] theory, there are three kinds of receptors: (1) One type is especially sensitive to short-wavelength light and generally responds most strongly to stimuli perceived as blue. (2) A second type of cone receptor is most sensitive to medium-wavelength light and responds greatly to stimuli generally seen as yellow-green. (3) A third type of cone responds most to long-wavelength light such as that reflected from stimuli perceived as orange-red.

KEY TERMS Dichromacy A deficiency in colour vision in which one of the three cone classes is missing. Negative afterimages The illusory perception of the complementary colour to the one that has just been fixated; green is the complementary colour to red and blue is complementary to yellow.

How do we see other colours? According to the theory, most stimuli activate two or all three cone types. The colour we perceive is determined by their relative stimulation levels. Evolution has equipped us with three types of cones because that produces a very efficient system – we can discriminate millions of colours even with so few cone types. Many forms of colour deficiency are consistent with trichromacy theory. Most individuals with colour deficiency have dichromacy, in which one cone class is missing. In red-green dichromacy (the most common form) there are abnormalities in the retinal pigments sensitive to medium or long wavelengths. Individuals with red-green dichromacy differ from intact observers in perceiving far fewer colours. However, their colour constancy (see Glossary) is almost at normal levels (Álvaro et al., 2017). The density of cones (the retinal cells responsible for colour vision) is far higher in the fovea (see Glossary) than the periphery. However, there are enough cones in the periphery to permit accurate peripheral colour judgements if colour patches are reasonably large (Rosenholtz, 2016). The crucial role of cones for colour vision explains the following common phenomenon: “The sunlit world appears in sparkling colour, but when night falls . . . we see the world in 50 shades of grey” (Kelber et al., 2017, p. 1). In dim light, the cones are not activated and our vision depends almost entirely on rods.

Opponent-process theory Trichromacy theory does not explain what happens after activation of the cone receptors. It also fails to account for negative afterimages. If you stare at a square of a given colour for several seconds and then shift your gaze to a white surface, you see a negative afterimage in the complementary colour (complementary colours produce white when combined). For example, a green square produces a red afterimage, whereas a blue square produces a yellow afterimage. Hering (1878) explained negative afterimages. He identified three types of opponent processes in the visual system. One opponent process (redgreen channel) produces perception of green when responding one way and red when responding the opposite way. A second opponent process

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 65

28/02/20 6:43 PM

66

Visual perception and attention

(blue-yellow channel) produces perception of blue or yellow in the same way. The third opponent process (achromatic channel) produces the perception of white at one extreme and black at the other. What is the value of these three opponent processes? The three dimensions associated with opponent processes provide maximally independent representations of colour information. As a result, opponent processes provide very efficient encoding of chromatic stimuli. Much research supports the notion of opponent processes. First, there is strong physiological evidence for the existence of opponent cells (Shevell & Martin, 2017). Second, the theory accounts for negative afterimages (discussed above). Third, the theory claims it is impossible to see blue and yellow together or red and green, but the other colour combinations can be seen. That is precisely what Abramov and Gordon (1994) found. Fourth, opponent processes explain some types of colour deficiency. Redgreen deficiency occurs when the red-green channel cannot be used, and blue-yellow deficiency occurs when individuals cannot make effective use of the blue-yellow channel.

Dual-process theory Hurvich and Jameson (1957) proposed a dual-process theory combining the ideas discussed so far. Signals from the three cones types identified by trichromacy theory are sent to the opponent cells (see Figure 2.16). There are three channels: (1) The achromatic [non-colour] channel combines the activity of the medium- and long-wavelength cones. (2) The blue-yellow channel represents the difference between the sum of the medium-and long-wavelength cones on the one hand and the short-wavelength cones on the other. The direction of difference determines whether blue or yellow is seen.

Figure 2.16 Schematic diagram of the early stages of neural colour processing. Three cone classes (red = long; green = medium; blue = short) supply three “channels”. The achromatic (light-dark) channel receives nonspectrally opponent input from long- and mediumcone classes. The two chromatic channels receive spectrally opponent inputs to create the red-green and blue-yellow channels. From Mather (2009). Copyright 2009 George Mather. Reproduced with permission.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 66

28/02/20 6:43 PM



67

Basic processes in visual perception

(3) The red-green channel represents the difference between activity levels in the medium- and long-wavelength cones. The direction of this difference determines whether red or green is perceived.

Overall evaluation Dual-process theory has much experimental support. However, it is ­oversimplified in several ways (Shevell & Martin, 2017). First, there are complex interactions between the channels. For example, short-­wavelength cones are activated even in conditions where it would be expected that only the red-green channel (involving medium- and long-wavelength cones) would be active (Conway et al., 2018). Second, the proportions of ­different cone types vary considerably across individuals but this typically has ­surprisingly little effect on colour perception. Third, the arrangement of cone types in the eye is fairly random. This seems odd because it p ­ resumably makes it hard for colour-opponent processes to work effectively. More generally, much research has focused on colour perception and other research has focused on how nerve cells respond to light of different wavelengths. What has proved difficult is to relate these two sets of findings directly to each other. So far there is only limited convergence between psychological and physiological research (Shevell & Martin, 2017).

KEY TERMS Colour constancy The tendency for an object to be perceived as having the same colour under widely varying viewing conditions. Illuminant A source of light illuminating a surface or object. Mutual illumination The light reflected from the surface of an object impinges on the surface of a second object.

Colour constancy Colour constancy is the tendency for a surface

or object to be perceived as having the same colour when there are changes in the wavelengths contained in the illuminant (the light source illuminating the surface or object). Colour constancy indicates colour vision does not depend solely on the wavelengths of the light reflected from objects. Learn more about colour constancy on YouTube: “This is Only Red by Vsauce”. Why is colour constancy important? If we lacked colour constancy, the apparent colour of familiar objects would change dramatically when the lighting conditions altered. This would make it very hard to recognise objects rapidly and accurately. Attaining reasonable levels of colour constancy is an impressive achievement. Look at the object in Figure 2.17. It is immediately ­ recognisable as a blue mug even though several other colours can be perceived. The wavelengths of light depend on the mug itself, the illuminant and ­reflections from other objects onto the mug’s surface (mutual illumination).

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 67

Figure 2.17 Photograph of a mug showing enormous variation in the properties of the reflected light across the mug’s surface. The patches at the top of the figure show image values from the locations indicated by the arrows. From Brainard and Maloney (2011). Reprinted with permission of the Association for Research in Vision and Ophthalmology.

28/02/20 6:43 PM

68

Visual perception and attention

How good is colour constancy?

Case study: Colour constancy

Colour constancy is often reasonably good. For example, Granzier et  al. (2009a) assessed colour constancy for six similarly coloured papers in various indoor and outdoor locations differing substantially in lighting conditions. They found 55% of the papers were identified correctly. This represents good performance given the similarities among the papers and the large differences in lighting conditions. Reeves et al. (2008) distinguished between our subjective experience and our judgements about the world. For example, as you walk towards a fire, it feels increasingly hot subjectively. However, how hot you judge the fire to be is unlikely to change. Reeves et al. found colour constancy with non-naturalistic (artificial stimuli) was much greater when observers judged the objective similarity of two stimuli seen under different illuminants than when rating their subjective similarity. Radonjić and Brainard’s (2016) obtained similar findings with naturalistic stimuli. However, colour constancy was higher overall with naturalistic stimuli because such stimuli provided more cues to guide performance.

Estimating scene illumination The wavelengths of light reflected from an object are greatly influenced by the illuminant (light source). High levels of colour constancy could be achieved if observers made accurate illuminant estimates. However, they often do not, especially when the illuminant’s characteristics are unclear (Foster, 2011). For example, there are substantial individual differences in the perceived illuminant (and perceived colour) of the famous dress discussed in the Box. Colour constancy should be high when illuminant estimation is accurate (Brainard & Maloney, 2011). Bannert and Bartels (2017) tested this prediction. Observers were presented with visual scenes using three different illuminants, and cues within the scenes were designed to facilitate colour constancy. Bannert and Bartels used functional magnetic resonance imaging (fMRI) to assess the neural encoding of each scene. What did Bannert and Bartels (2017) find? Their key finding was that, “The neural accuracy of encoding the illuminant of a scene [predicted] the behavioural accuracy of constant colour perception” (p. 357). Thus, colour constancy was high when the illuminant was processed accurately.

Local colour contrast Land (1986) proposed retinex theory, according to which we perceive a surface’s colour by comparing its ability to reflect, short-, medium- and long-wavelength light against that of adjacent surfaces. Thus, we make use of local colour contrast. Kraft and Brainard (1999) studied colour constancy for complex visual scenes. Under full viewing conditions, colour constancy was 83% even with large changes in illumination. When local contrast could not be used, however, colour constancy dropped to 53%. Foster and Nascimento (1994) developed Land’s ideas into an influential theory based on local contrast. We can see the nature of their big



69

Basic processes in visual perception

IN THE REAL WORLD: WHAT COLOUR IS “THE DRESS”? On 7 February 2015, Cecilia Bleasdale took a photograph of the dress she intended to wear at her daughter’s imminent wedding (see below) and posted it on the internet. It caused an almost immediate sensation because observers disagreed vehemently concerning the dress’s colour. What colour do you think the dress is (see Figure 2.18)? Wallisch (2017) found 59% of observers said the dress was white and gold and 27% said it was black and blue. How can we explain these individual differences? Wallisch argued the illumination of the dress is ambiguous: the upper part of the dress implies illumination by daylight whereas the lower part implies artificial illumination. Many theories predict the perceived colour of an object depends on its assumed illumination (discussed on p. 62). If so, observers assuming the dress is illuminated by natural light should perceive it as white and gold. In contrast, those assuming artificial illumination should perceive it as black and blue. What did Wallisch (2017) find? As predicted, observers assuming the dress was illuminated by natural light were much more likely than those assuming artificial light to perceive the dress as white/gold (see Figure 2.19). 75

Percent reporting white/gold

70

65

60

55

50

45 Natural

Artificial

Unsure

light assumption Figure 2.18 “The Dress” made famous by its appearance on the internet. From Rabin et al. (2016).

Figure 2.19 The percentage of observers perceiving “The Dress” to be white and gold depended on whether they believed it to be illuminated by natural light or by artificial light, and those who were unsure. From Wallisch et al. (2017).

discovery through an example. Suppose there are two illuminants and two surfaces. If surface 1 led to the long-wavelength or red cones responding three times as much with illuminant 1 as illuminant 2, then the same threefold difference was also found with surface 2. Thus, the ratio of cone responses was essentially invariant across different illuminations. Thus,

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 69

28/02/20 6:43 PM

70

KEY TERM Chromatic adaptation Changes in visual sensitivity to colour stimuli when the illumination alters.

Visual perception and attention

cone-excitation ratios can be used to eliminate the illuminant’s effects and so increase colour constancy. Much evidence indicates cone-excitation ratios are important (Foster, 2011, 2018). For example, Nascimento et al. (2004) obtained evidence ­suggesting the level of colour constancy in different conditions could be predicted on the basis of cone-excitation ratios. Foster and Nascimento’s (1994) theory provides an elegant account of illuminant-independent colour constancy in simple visual environments. However, it has limited value in complex visual environments. For example, colour constancy for a given object can become harder because of reflections from other objects (see Figure 2.17) or because multiple sources of illumination are present together. The theory is generally less applicable to natural scenes than artificial laboratory scenes. For example, the illuminant often changes more rapidly in natural scenes (e.g., clouds change shape, which influences the shadows they cast) (Nascimento et al., 2016). In addition, there are dramatic changes in the level and colour of natural illuminants over the course of the day. In sum, cone-excitation ratios are most likely to be almost invariant, “provided that sampling is from points close together in space or time . . ., or from points separated arbitrarily but undergoing even changes in ­illumination” (Nascimento et al., 2016, p. 44).

Effects of familiarity Colour constancy is influenced by our knowledge of the familiar colours of objects (e.g., bananas are yellow). Hansen et al. (2006) asked observers to view photographs of fruits and to adjust their colour until they appeared grey. There was over-adjustment. For example, a banana still looked yellowish to observers when it was actually grey, leading them to adjust its colour to a slightly bluish hue. Such findings may reflect an influence of familiar size on subjective colour perception. Alternatively, familiar colour may primarily influence observers’ responses rather than their perception (e.g., our knowledge that bananas are yellow may bias us to report them as more yellow than they actually appear). Vandenbroucke et al. (2016) investigated the above issue. Observers viewed an ambiguous colour intermediate between red and green presented on typically red (e.g., tomato) or green (e.g., pine tree) objects. Familiar colour influenced colour perception. Of most importance, neural responses in various visual areas (e.g., V4, which is much involved in colour processing) were influenced by familiar colour. Neural responses corresponded more closely to those associated with red objects when the object was typically red than when it was typically green and more closely to those found with green objects when it was typically green. Thus, familiar colour had a direct influence on perception early in visual processing.

Chromatic adaptation One reason we have reasonable colour constancy is because of chromatic adaptation – an observer’s visual sensitivity to a given illuminant decreases over time. If you stand outside after nightfall, you may be surprised by the

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 70

28/02/20 6:43 PM



71

Basic processes in visual perception

apparent yellowness of the artificial light in people’s houses. However, this is not the case if you spend some time in a room illuminated by artificial light. Lee et al. (2012b) found some aspects of chromatic adaptation within six seconds. Such rapid adaptation increases colour constancy.

Evaluation In view of the complexity of colour constancy, it is unsurprising the visual system adopts an “all hands on deck” approach in which several factors contribute to colour constancy. Of major importance are zone-excitation ratios that remain almost invariant across changes in illumination. In addition, top-down factors (e.g., our memory for the familiar colours of common objects) also play a role. What are the limitations of theory and research on colour constancy? First, we lack a comprehensive theory of how the various factors combine. Second, most research has focused on relatively simple artificial visual environments. In contrast, “The natural world is optically unconstrained. Surface properties may vary from one point to another, and reflected light may vary from one instant to the next” (Foster, 2018, p. B192). As a result, the processes involved in trying to achieve colour constancy in more complex environments are poorly understood. Third, more research is needed to understand why colour constancy depends greatly on the precise instructions given to observers. Fourth, as Webster (2016, p. 195) pointed out, “There are pronounced [individual] differences in almost all measures of colour appearance . . . the basis for these differences remains uncertain.”

KEY TERMS Monocular cues Cues to depth that can be used by one eye but can also be used by both eyes together. Binocular cues Cues to depth that require both eyes to be used together. Oculomotor cues Cues to depth produced by muscular contractions of the muscles around the eye; use of such cues involves kinaesthesia (also known as the muscle sense).

DEPTH PERCEPTION A major accomplishment of visual perception is the transformation of the two-dimensional retinal image into perception of a three-dimensional world seen in depth. The construction of 3-D representations is very important if we are to pick up objects, decide whether it is safe to cross the road and so on. Depth perception depends on numerous visual and other cues (discussed below). All cues provide ambiguous information and so we would be ill-advised to place total reliance on any single cue. Moreover, different cues often provide conflicting information. When you watch a movie, some cues (e.g., stereo ones) indicate everything you see is at the same distance. In contrast, other cues (e.g., perspective; shading) indicate some objects are closer. In real life, depth cues are often provided by movement of the observer or objects in the visual environment and some cues are non-visual (e.g., object sounds). Here, however, the main focus will be on visual depth cues available when the observer and environmental objects are static. Cues to depth perception are monocular, binocular and oculomotor. Monocular cues require only one eye but can also be used with two eyes. The fact that the world still retains a sense of depth with one eye closed indicates clearly that monocular cues exist. Binocular cues involve both eyes used together. Finally, oculomotor cues depend on sensations of

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 71

28/02/20 6:43 PM

72

KEY TERMS Texture gradient The rate of change of texture density from the front to the back of a slanting object.

Visual perception and attention

muscular contractions of the muscles around the eye. Use of these cues involves kinaesthesia (the muscle sense).

Monocular cues Monocular cues to depth are called pictorial cues because they are used by artists. Of particular importance is linear perspective, which artists use to create the impression of three-dimensional scenes on two-dimensional canvases. Linear perspective (based on laws of optics and geometry) is based on various principles. For example, parallel lines pointing away from us converge (e.g., motorway edges) and objects reduce in size as they recede into the distance. Tyler (2015) argued that linear perspective is only really effective in creating a powerful 3-D effect when viewed from the point from which the artist constructed the perspective. This is typically very close to the picture as can be seen in a drawing by the Dutch artist Jan Vredeman de Vries (see Figure 2.20). Texture is another monocular cue. Most objects (e.g., carpets; cobblestone roads) possess texture, and textured objects slanting away from us have a texture gradient (Gibson, 1979; see Figure 2.21). This is a gradient (rate of change) of texture density as you look from the front to the back of a slanting object with the gradient changing more rapidly for objects slanted steeply away from the observer. Sinai et al. (1998) found observers judged the distances of nearby objects better when the ground was uniformly textured than when there was a gap (e.g., a ditch) in the texture pattern. Texture gradient is a limited cue because the perceived slant depends on the direction of the gradient. For reasons that are unclear, ground ­patterns are perceived as less slanted than equivalent ceiling or sidewall patterns (Higashiyama & Yamazaki, 2016).

Figure 2.20 An engraving by de Vries (1604/1970) in which linear perspective creates an effective three-dimensional effect when viewed from very close but not from further away. From Todorovic´ (2009). Copyright 1968 by Dover Publications. Reprinted with permission from Springer.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 72

28/02/20 6:43 PM



Basic processes in visual perception

73

Another monocular cue is interposition where a nearer object hides part of a more distant one. The strength of this cue can be seen in Kanizsa’s (1976) illusory square (see Figure 2.22). There is a strong impression of a yellow square in front of four purple circles even though many of  its contours are missing. This depends on processes  that relatively “automatically” complete boundaries using the available information (e.g., ­ incomplete circles). Another useful cue is familiar size (discussed more fully later). If we know an object’s size, we can use its retinal image size to estimate its distance. However, we can be misled. Ittelson (1951) had observers view playing cards through a peephole restricting them to monocular vision. The perceived distance was determined almost entirely by familiar size. For example, playing cards Figure 2.21 double the usual size were perceived as being twice as far Examples of texture gradients that can be perceived as surfaces receding into the away from the observers than was actually the case. distance. We turn now to blur. There is no blur at fixation From Bruce et al. (2003). point and it increases more rapidly at closer distances than ones further away. Held et al. (2012)  found blur was an effective depth cue (especially at longer distances). However, observers may simply have learned to respond that the blurrier stimulus was further away. Langer and Siciliano (2015) provided minimal training and obtained little evidence blur was used as a depth cue. They argued blur provides ambiguous information: an object can appear blurred because it is in peripheral vision rather than because it is far away. Finally, there is motion parallax, which involves ­“transformations of the retinal image that are created . . . both when the observer moves (observer-­ produced parallax) and when objects move with respect to the ­ observer (object-produced parallax)” (Rogers, 2016, p.  1267). For example, when you look out of the window  of a moving train, nearby objects appear to move  in the opposite direction but distant objects in Figure 2.22 the same d ­irection. Rogers and Graham (1979) found Kanizsa’s (1976) illusory square. motion parallax on its own can produce accurate depth judgements. Most research demonstrating the value of motion parallax as a depth cue has used very simple random-dot displays. However, Buckthought et al. (2017) found comparable effects in more complex and naturalistic conditions. Cues such as linear perspective, texture gradient and interposition allow observers to perceive depth even in two-dimensional displays. KEY TERMS However, research with c­ omputer-generated two-dimensional displays has Motion parallax found  depth is often underestimated (Domini et al., 2011). Such displays A depth cue based on provide cues to flatness (e.g., binocular ­ disparity, accommodation and movement in one part of the retinal image relative vergence, all  discussed on pp. 74–75) that may reduce the impact of cues to another. ­suggesting depth.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 73

28/02/20 6:43 PM

74

KEY TERMS Binocular disparity A depth cue based on the slight disparity in the two retinal images when an observer views a scene; it is the basis for stereopsis. Stereopsis Depth perception based on the small discrepancy in the two retinal images when a visual scene is observed (binocular disparity). Autostereogram A complex twodimensional image perceived as threedimensional when not focused on for a period of time. Amblyopia A condition in which one eye sends an inadequate input to the visual cortex; colloquially known as lazy eye.

Visual perception and attention

Binocular cues Depth perception does not depend solely on monocular and oculomotor cues. It can also be achieved by binocular disparity, which is the slight difference or disparity in the images projected on the retinas of the two eyes when you view a scene (Welchman, 2016). Binocular disparity produces stereopsis (the ability to perceive the world three-dimensionally). The great subjective advantage of binocular vision was described by Susan Barry (2009, pp. 94–132), a neuroscientist who recovered binocular vision in late adulthood: [I saw] palpable volume[s] of empty space . . . I could see, not just infer, the volume of space between tree limbs . . . the grape was rounder and more solid than any grape I had ever seen . . . Objects seemed more solid, vibrant, and real. Stereopsis is very powerful at short distances. However, the disparity or discrepancy in the retinal images of objects decreases by a factor of 100 as their distance from an observer increases from 2 to 20 metres. Thus, stereopsis rapidly becomes less available at greater distances. While stereopsis provides valuable information at short distances, we must not exaggerate its importance. Bülthoff et al. (1998) found observers’ recognition of familiar objects was not adversely affected when stereoscopic information was scrambled. Indeed, observers were unaware the depth information was scrambled! Stereopsis involves matching features in the inputs to the two eyes. This process is fallible. For example, consider an autostereogram (a two-­dimensional image containing depth information so it appears three-­ dimensional when viewed appropriately; the Wikipedia entry for autostereogram provides examples). With autostereograms, the same repeating 2-D pattern is presented to each eye. If there is a dissociation of vergence and accommodation, two adjacent patterns will form an object apparently at a different depth from the background. Some individuals are better than others at perceiving 3-D objects in autostereograms because of individual differences in binocular disparity, vergence and accommodation (Gómez et al., 2012). The most common reason for impaired stereoscopic depth perception is amblyopia (one eye exhibits poor visual acuity; also known as lazy eye). However, deficient stereoscopic depth perception can also result from damage to various cortical areas (Bridge, 2016). As Bridge concluded, intact stereoscopic depth perception requires the following: “(i) both eyes aligned and functional; (ii) control over the eye muscles and vergence to the images into alignment; (iii) initial matching of retinal images; and (iv) integration of disparity information” (p. 2).

Oculomotor cues The pictorial cues discussed so far can all be used equally well by oneeyed individuals as by those with intact vision. Depth perception also depends on oculomotor cues based on perceiving muscle contractions

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 74

28/02/20 6:43 PM



75

Basic processes in visual perception

around the eyes. One such cue is vergence (the eyes turn inwards to focus on very close objects than those further away). Another oculomotor cue is  ­accommodation. It refers to the variation in optical power produced by the thickening of the eye’s lens when someone focuses on a close object. Vergence and accommodation are both very limited. First, they only provide information about the distance of a single object at any given time. Second, they are both of value only when judging the distance of close objects. Even then, the information they provide is not very accurate.

Cue combination or integration So far we have considered depth cues one by one. In the real world, however, we typically have access to many depth cues. How do we use these cues? One possibility is additivity (combining or integrating information from all cues) and another possibility is selection (using information from only a single cue) (Bruno & Cutting, 1988). How could we maximise the accuracy of our depth perception? Jacobs (2002) argued we should assign more weight to reliable cues. Since cues reliable in one context may be less so in a different context, we should be flexible when assessing cue reliability. These considerations led Jacobs to propose two hypotheses:

KEY TERMS Vergence A cue to depth based on the inward focus of the eyes with close objects. Accommodation A depth cue based on changes in optical power produced by thickening of the eye’s lens when an observer focuses on close objects.

(1) Less ambiguous cues (i.e., those providing consistent information) are regarded as more reliable than more ambiguous ones. (2) A cue is regarded as reliable if inferences based on it are consistent with those based on other available cues. Other theoretical approaches resemble that of Jacobs (2002). For example, Rohde et al. (2016, p. 36) discuss Maximum Likelihood Estimation, which is “a rule used . . . to optimally combine redundant estimates of a variable [e.g., object distance] by taking into consideration the reliability of each estimate and weighting them accordingly”. We can extend this approach to include prior knowledge (e.g., natural light typically comes from above; many familiar objects have a typical size). Finally, there are ideal-observer models (e.g., Landy et al., 2011; Jones, 2016). Many of these models are based on the Bayesian approach (see Chapter 13), in which initial probabilities are altered by new data or information (e.g., presentation of cues). Ideal-observer models involve making assumptions about the optimal way of combining the cue and other i­nformation available and comparing that against observers’ actual performance. As we will see, experimentation has benefitted from advances in virtual reality technologies. These advances permit researchers to control visual cues very precisely, thus permitting clear-cut tests of many hypotheses.

Findings Evidence supporting Jacobs’ (2002) first hypothesis was reported by Triesch et al. (2002). Observers in a virtual reality situation tracked an object defined by colour, shape and size. On each trial, two attributes were unreliable or inconsistent (their values changed frequently). Observers attached

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 75

28/02/20 6:43 PM

76

KEY TERM Haptic Relating to the sense of touch.

Visual perception and attention

increasing weight to the reliable or consistent cue and less to the unreliable cues during each trial. Evidence supporting Jacobs’ (2002) second hypothesis was reported by Atkins et al. (2001). Observers in a virtual reality environment viewed and grasped elliptical cylinders. There were three cues to cylinder depth: texture, motion and haptic (relating to the sense of touch). When the haptic and texture cues indicated the same cylinder depth but the motion cue indicated a different depth, observers made increasing use of the texture cue and decreasing use of the motion cue. When the haptic and motion cues indicated the same cylinder depth but the texture cue did not, observers increasingly relied on the motion cue rather than the texture cue. Thus, whichever visual cue correlated with the haptic cue was preferred, and this preference increased with practice. Much research suggests observers integrate cue information according to the additivity notion: they take account of most (or all) cues but attach additional weight to more reliable ones (Landy et al., 2011). However, these conclusions are based primarily on studies involving only small ­conflicts in the information provided by each cue. What happens when two or more cues are in strong conflict? Observers typically rely heavily (or even exclusively) on only one cue, i.e., they use the selection strategy as defined by Bruno and Cutting (1988; see p. 75). This makes sense. Suppose one cue suggests an object is 10 metres away but another cue suggests it is 90 metres away. It is probably not sensible to split the difference and decide it is 50 metres away! We use the ­selection strategy at the movies – perspective and texture cues produce a 3-D effect, whereas we largely ignore cues (e.g., binocular disparity) indicating everything on the screen is the same distance from us. Relevant evidence was reported by Girshick and Banks (2009) in a study on slant perception. When there was a small conflict between the ­information provided by binocular disparity and texture gradient cues, observers used information from both. However, when there was a large conflict between these cues, perceived slant was determined ­exclusively by one cue (binocular disparity or texture gradient). Interestingly, the  ­observers were not consciously aware of the large conflict between the cues. Do observers combine information from different cues to produce optimal performance (i.e., accurate depth perception)? Lovell et al. (2012) compared the effects of binocular disparity and shading on depth perception. Overall, binocular disparity was the more informative cue to depth, but Lovell et al. tested the effects of making it less reliable. Information from the cues was combined optimally, with observers consistently ­attaching more weight to reliable cues. Many other studies have also reported that observers’ depth perception is close to optimal. However, there are several studies where observers performed less impressively (Rahnev & Denison, 2018). For example, Chen and Tyler (2015) carried out a similar study to that of Lovell et al. (2012). Observers’ depth judgements were strongly influenced by shading but made very little use of binocular disparity information.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 76

28/02/20 6:43 PM



77

Basic processes in visual perception

Evaluation Much has been learned about the numerous cues observers use to estimate depth or distance. Information from different depth cues is typically combined or integrated in studies assessing depth perception. There is also evidence that one cue often dominates the others when different cues conflict strongly. Overall, as Brenner and Smeets (2018, p. 385) concluded, “By combining the many sources of information in a clever manner people obtain quite reliable judgments that are not too sensitive to violations of the assumptions of the individual sources of depth information.” More specifically, observers generally attach most weight to cues providing reliable information consistent with that provided by other cues. If a cue becomes more or less reliable over time, observers generally increase or decrease its weighting appropriately. Overall, depth perception often appears close to optimal. What are the limitations of theory and research on cue integration? First, we typically estimate distance in real-life settings where numerous cues are present and there are no large conflicts among them. In contrast, laboratory settings often provide only a few cues and these cues sometimes provide very discrepant information. The unfamiliarity of laboratory settings may sometimes cause suboptimal performance by observers and reduce generalisation to everyday life (Landy et al., 2011). Second, the assumption that observers process several essentially independent cues before integrating all the information is dubious. It may apply when observers view a very limited and artificial visual display. However, natural environments typically provide observers with very rich information. In such environments, visual processing probably depends more on a global assessment of the overall structure of the environment and less on processing of specific depth cues than usually assumed (Sedgwick & Gillam, 2017). There are also issues concerning the meaning of the word “cue”. For example, “Stereopsis is not a cue. It encompasses all the ways images of a scene differ in the two eyes” (Sedgwick & Gillam, 2017, p. 81). Third, ideal-observer models differ in the assumptions used to compute “ideal” performance and the meaning of “optimal” combining of cues in depth perception (Rahnev & Denison, 2018). Most models focus on the accuracy of depth-perception judgements. However, there are circumstances (e.g., presence of a fierce wild animal) where rapid if somewhat inaccurate judgements are preferable. More generally, humans focus on “computational efficiency” – our goal is to maximise reward while minimising the computational costs of visual processing (Summerfield & Li, 2018). Thus, optimality of depth-perception judgements does not depend solely on performance accuracy.

KEY TERM Size constancy Objects are perceived to have a given size regardless of the size of the retinal image.

Size constancy Size constancy is the tendency for any given object to appear the same

size whether its size in the retinal image is large or small. For example, if

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 77

28/02/20 6:43 PM

78

Visual perception and attention

someone walks towards you, their retinal image increases progressively but their apparent size remains the same. Why do we show size constancy? Many factors are involved. An object’s apparent distance is especially important when judging its size. For example, an object may be judged to be large even though its retinal image is very small provided it is far away. According to the size-distance invariance hypothesis (Kilpatrick & Ittelson, 1953), perceived size is ­proportional to perceived distance.

Findings Haber and Levin (2001) argued that an object’s perceived size depends on memory of its familiar size as well as perceptual information concerning its distance. Initially, observers estimated the sizes of common objects with great accuracy from memory. Then they saw various objects at close (0–50 metres) or distant (50–100 metres) viewing range and made size judgements. Some familiar objects were almost invariant in size (e.g., bicycle) or of varying size (e.g., television set); there were also unfamiliar stimuli (e.g., ovals). What findings would we expect? If familiar size is important, size judgements should be more accurate for objects of invariant size than those of variable size, with size judgements least accurate for unfamiliar objects. If distance perception is all-important (and known to be more accurate for nearby objects), size judgements should be better for all object categories at close viewing range. Haber and Levin (2001) found that size judgements were much better with objects having an invariant size than those having a variable size (see Figure 2.23). In addition, the viewing distance had a minimal effect on size judgements. Both of these findings are contrary to predictions from the size-distance invariance hypothesis. If size judgements depend on perceived distance, size constancy should not be found when an object’s perceived distance differs considerably from

Figure 2.23 Accuracy of size judgements as a function of object type (unfamiliar; familiar variable size; familiar invariant size) and viewing distance (0–50 metres vs 50–100 metres). Based on data in Haber and Levin (2001). 

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 78

28/02/20 6:43 PM



79

Basic processes in visual perception

Figure 2.24 (a) A representation of the Ames room; (b) an actual Ames room showing the effect achieved with two adults. Photo Peter Endig/dpa/Corbis.

its actual distance. The Ames room (Ames, 1952; see Figure 2.24) provides a good example. It has a peculiar shape: the floor slopes and the rear wall is not at right angles to the adjoining walls. Nevertheless, the Ames room creates the same retinal image as a normal rectangular room when viewing monocularly through a peephole. The fact that one end of the rear wall is much further away from the viewer is disguised by making it much higher. The cues suggesting the rear wall is at right angles to observers are so strong they mistakenly assume two adults standing in the corners by the

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 79

KEY TERM Ames room A very distorted room that nevertheless looks normal under certain viewing conditions.

28/02/20 6:43 PM

80

KEY TERMS Honi phenomenon The typical apparent size changes when an individual walks along the rear wall of the Ames room are reduced when female observers view a man to whom they are very close emotionally. Open-object illusion The misperception that objects with missing boundaries are larger than objects the same size without missing boundaries. Body size effect An illusion in which misperception of one’s own bodily size causes the perceived size of objects to be misjudged.

Visual perception and attention

rear wall are at the same distance (see photograph). They thus estimate the size of the nearer adult as much greater than that of the adult further away. See the Ames room on YouTube: “Ramachandran – Ames room illusion explained”. The illusion effect with the Ames room is so great someone walking backwards and forwards in front of the rear wall seems to grow and shrink as they move! Thus, perceived distance apparently determines perceived size. However, this effect is reduced when the person walking along the rear wall is a man and the observer is a female having a close emotional relationship with him. This is known as the Honi phenomenon because it was first experienced by a woman (whose nickname was Honi) when she saw her husband in the Ames room. Similarly dramatic findings were reported by Glennerster et al. (2006). Participants walked through a virtual-reality room as it expanded or contracted considerably. Even though they had detailed information from motion parallax and motion to indicate the room’s size was changing, no participants noticed the changes! There were large errors in participants’ judgements of the sizes of objects at longer distances because of their powerful expectation the size of the room would not alter. Some evidence discussed so far has been consistent with the assumption of the size-distance invariance hypothesis that perceived size depends on perceived distance. However, many other findings are inconsistent (Kim, 2017b). For example, Kim et al. (2016) obtained size and distance estimates from observers for objects placed in various tunnels. Size and distance were perceived independently (i.e., depended on different factors). In contrast, the size-distance invariance hypothesis predicts that perceived size and perceived distance should depend on each other and thus should not be independent. Kim (2018) obtained similar findings when observers viewed a virtual object presented stereoscopically. Their size judgements were more accurate than their distance judgements, with each judgement depending on its own information source. More evidence inconsistent with the size-distance invariance hypothesis was reported by Makovski (2017). Participants were presented with stimuli such as those shown in Figure 2.25 on a monitor. Even though perceived distance was the same for all stimuli, “open” objects (having missing boundaries) were perceived as much larger than “closed” objects (with all boundaries intact). This is the open-object illusion in which observers extend the missing boundaries. This may resemble our common perception that open windows make a room seem larger. Van der Hoort et al. (2011) found evidence for the body size effect,  in  which the size of a body mistakenly perceived to be one’s own influences the perceived sizes of objects. Participants equipped with ­head-mounted displays connected to CCTV cameras saw the environment from the ­perspective of a doll (see Figure 2.26). The doll was small or large. Van der Hoort et al. (2011) found objects were perceived as larger and further away when the doll was small than when it was large. These effects were greater when participants misperceived the body as their own (this was achieved by having the bodies of the participants and the doll

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 80

28/02/20 6:43 PM



touched at the same time). Thus, size and ­ distance perception depend partly on our lifelong experience of seeing everything from the perspective of our own body. Tajadura-Jiménez et al. (2018) ex­ tended the above findings. Participants experienced having the body of a 4-yearold child or an adult with the body scaled down to match the height of the child’s body. Object size was overestimated more in the child-body condition, indicating that object size is influenced by higher-level cognitive processes (i.e., age perception).

81

Basic processes in visual perception

A

B

C

D

Evaluation Size perception and size constancy sometimes depend on perceived dis- Figure 2.25 tance. Some of the strongest evidence Top: stimuli presented to participants; bottom: example of the comes from research where misper- stimulus display. ceptions of distance (e.g., in the From Makovski (2017). Ames room; in virtual environments) produce systematic distortions in perceived size. However, several other factors also  ­ influence size perception. These include familiar size, one’s perceived  body size and whether objects do (or do not) contain missing boundaries. What are the limitations of research and theory on size perception? First, psychologists have discovered fewer sources of information accounting for size perception than depth perception. In addition, as Kim (2017b, Figure 2.26 p.  2) pointed out, “The efficacy of the few information sources that have What participants in the been identified for size perception is questionable.” Second, while the doll experiment could see. size-distance invariance hypothesis remains influential, there is a “vast lit- From the viewpoint of a erature demonstrating independence of perceived size and distance” (Kim, small doll, objects such as a hand look much larger 2018, p. 17).

PERCEPTION WITHOUT AWARENESS: SUBLIMINAL PERCEPTION Can we perceive aspects of the visual world without any conscious awareness we are doing so? In other words, is there such a thing as subliminal ­perception (stimulus perception occurring even though the stimulus is below the threshold of conscious awareness)? Common sense suggests the answer is “No”. However, much research evidence suggests the answer is “Yes”. However, we must use terms carefully. A thermostat responds appropriately to temperature changes and so could be said to exhibit unconscious perception! Much important evidence has come from blindsight patients with damage to early visual cortex (V1), an area of crucial importance to

than when seen from the viewpoint of a large doll. This exemplifies the body size effect.

From Van der Hoort et al. (2011). Public Library of Science. With kind permission from the author.

KEY TERM Subliminal perception Perceptual processing occurring below the level of conscious awareness that can nevertheless influence behaviour.

82

KEY TERM Blindsight The ability to respond appropriately to visual stimuli in the absence of conscious visual experience in patients with damage to the primary visual cortex.

Visual perception and attention

visual perception (discussed on pp. 45–46). Blindsight refers to patients’ ability  to  “detect, localise, and discriminate visual stimuli in their blind field, despite denying being able to see the stimuli” Mazzi et al. (2016, p. 1). In what follows, we initially consider blindsight patients. After that, we discuss evidence of subliminal perception in healthy individuals.

Blindsight Many British soldiers in the First World War who had been blinded by gunshot wounds that destroyed their primary visual cortex (V1 or BA17) were treated by George Riddoch, a captain in the Royal Army Medical Corps. These soldiers responded to motion in those parts of the visual field in which they claimed to be blind. The apparently paradoxical nature of their condition was neatly captured by Weiskrantz et al. (1974), who coined the term “blindsight”. How is blindsight assessed? Various approaches have been taken but there are generally two measures. First, there is a forced-choice test in which patients guess (e.g., stimulus present or absent?) or point at stimuli they cannot see. Second, there are patients’ subjective reports that they cannot see stimuli presented to their blind region. Blindsight is typically defined by an absence of self-reported visual perception accompanied by above-chance performance on the forced-choice test.

IN THE REAL WORLD: BLINDSIGHT PATIENT DB Much early research on blindsight involved a patient, DB. He was blind in the lower part of his left visual field as a result of surgery involving removal of part of his right primary visual cortex (BA17) to relieve his frequent severe migraine. DB was studied intensively by Larry Weiskrantz. DB is one of the most thoroughly studied blindsight patients (see Weiskrantz, 2010, for a historical review). He underwent surgical removal of the right occipital cortex, including most of the primary visual cortex, to relieve very severe migraine attacks. DB could detect the presence of an object and could indicate its approximate location by pointing. He could also discriminate between moving and stationary objects and could distinguish vertical from horizontal lines. However, DB’s abilities were limited – he could not distinguish between different-sized rectangles or between triangles having straight and curved sides. Such findings suggest DB processed only low-level features of visual stimuli and could not discriminate form. We have seen DB showed some ability to perform various visual tasks. However, he reported no conscious experience in his blind field. According to Weiskrantz et al. (1974, p. 721), “When he was shown a video film of his reaching and judging orientation of lines [by presenting it to his intact visual field], he was openly astonished.” Campion et al. (1983) pointed out that DB and other blindsight patients are only partially blind. They favoured the stray-light hypothesis, according to which patients respond to light reflected from the environment onto areas of the visual field still functioning. This hypothesis implies DB should have shown reasonable visual performance when objects were presented to his blind spot (the area where the optic nerve passes through the retina). However, DB could not detect objects presented to his blind spot.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 82

28/02/20 6:43 PM



Basic processes in visual perception

83

We must not exaggerate patients’ preserved visual abilities. Indeed, their visual abilities in their blind field are so poor that a seeing person with comparable impairment would be legally classified as blind.

What do blindsight patients experience? It is surprisingly hard to decide exactly what blindsight patients experience when presented with visual stimuli to their blind field. For example, the blindsight patient GY described his experiences as “similar to that of a normally sighted man who, with his eyes shut against sunlight, can perceive the direction of motion of a hand waved in front of him” (Beckers & Zeki, 1995, p. 56). On another occasion GY was asked about his qualia (sensory experiences). He said, “That [experience of qualia] only happens on very easy trials, when the stimulus is very bright. Actually, I’m not sure I really have qualia then” (Persaud & Lau, 2008, p. 1048). There is an important distinction between type-1 and type-2 blindsight. Type-1 blindsight occurs when patients have no conscious awareness of visual stimuli presented to the blind field. In contrast, type-2 blindsight occurs when patients have some residual awareness (although very different from that of healthy individuals). For example, a patient, EY, “sensed a definite pinpoint of light”, although “it looks like nothing at all” (Weiskrantz, 1980). Another patient, GY, said, “You don’t actually ever sense anything or see anything . . . it’s more an awareness but you don’t see it” (Weiskrantz, 1997). Many patients exhibit type-1 blindsight on some occasions but type-2 blindsight on others.

Findings: evidence for blindsight Numerous studies have assessed the perceptual abilities of blindsight patients. Here we briefly consider three illustrative studies. As indicated already, blindsight patients often perform better when guessing an object’s direction of motion than its perceptual qualities (e.g., form; colour). For example, Chabanat et al. (2019) studied a blindsight patient, SA. He was correct 98% of the time when reporting an object’s direction of motion but performed at chance level when reporting its colour. GY (discussed earlier) is a much-studied blindsight patient. He has extensive damage to the primary visual cortex in the left hemisphere. In one study (Persaud & Cowey, 2008), GY was presented with a stimulus in the upper or lower part of his visual field. On inclusion trials, he was instructed to report the part of the visual field to which the stimulus had been presented. On exclusion trials, GY was instructed to report the opposite of its actual location (e.g., “up” when it was in the lower part). GY tended to respond with the real rather than the opposite location on exclusion and inclusion trials suggesting he had access to location information but lacked any conscious awareness of it (see Figure 2.27). In contrast, healthy individuals showed a large difference in performance on inclusion and exclusion trials indicating they had conscious access to ­location information.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 83

28/02/20 6:43 PM

84

Visual perception and attention

Figure 2.27 Estimated contributions of conscious and subconscious processing to GY’s performance in exclusion and inclusion conditions in his normal and blind fields. Reprinted from Persaud and Cowey (2008). Reprinted with permission from Elsevier.

Persaud et al. (2011) manipulated the stimuli presented to GY so his visual performance was comparable in both fields. However, GY indicated conscious awareness of far more stimuli in the intact field than the blind one (43% of trials vs 3%, respectively). GY had substantially more activation in the prefrontal cortex and parietal areas to targets presented in the intact field suggesting those targets were processed much more thoroughly.

Blindsight vs degraded conscious vision Some researchers argue blindsight patients exhibit degraded vision rather than a total absence of conscious awareness of “blind” field stimuli. For example, Overgaard et al. (2008) asked a blindsight patient, GR, to decide whether a triangle, circle or square had been presented to her blind field. In one experiment, GR simply responded “yes” or “no”. In another experiment, Overgaard et al. used a 4-point Perceptual Awareness Scale: “clear image”, “almost clear image”, “weak glimpse” and “not seen”. Using the yes/no measure, GR indicated she had not seen the stimulus on 79% of trials. However, she identified it correctly 46% of the time. These findings suggest the presence of type-1 blindsight. With the 4-point scale, in contrast, GR was correct 100% of the time when she had a clear image, 72% of the time when her image was almost clear, 25% when she had a weak glimpse and 0% when the stimulus was not seen. If the “clear image” and “almost clear image” data are combined, GR claimed awareness of the stimulus on 54% of trials, on 83% of which she was correct. Thus, the use of a sensitive method (the 4-point scale) suggested much of GR’s apparent blindsight reflected degraded conscious vision. Ko and Lau (2012) argued blindsight patients have more conscious visual experience than usually assumed. Their key assumption was as follows: “Blindsight patients may use an unusually conservative criterion for detection, which results in them saying ‘no’ nearly all the time to the question of ‘do you see something?’” (Ko & Lau, 2012, p. 1402). This excessive caution may occur in part because damage to the prefrontal cortex impairs their ability to set the criterion for visual detection appropriately. Their excessive conservatism or caution may explain why the reported visual experience of blindsight patients is so discrepant from their forced-choice perceptual performance.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 84

28/02/20 6:43 PM



Basic processes in visual perception

85

Ko and Lau’s (2012) theoretical position is supported by Overgaard et al.’s (2008) finding (discussed on p. 84) that blindsight patients were very reluctant to admit to having seen stimuli presented to their blind field. They also cited research supporting their assumption that blindsight patients often have prefrontal damage. Mazzi et al. (2016) carried out a study resembling that of Overgaard et al. (2008) on another blindsight patient, SL, showing no activity in the primary visual cortex (V1). SL decided which of two features (e.g., red or green colour) was present in a stimulus. When she indicated whether she had seen the stimulus or was merely guessing, her guessing performance was significantly above chance suggestive of type-1 blindsight. However, when she indicated her awareness using the 4-point Perceptual Awareness Scale, her visual performance was at chance level when she reported no awareness of the stimulus. These findings suggest an absence of blindsight. The title of Mazzi et al.’s article provides the take-home message: “Different measures tell a different story” (p. 1). What can we conclude? Overgaard and Mogensen (2015, p. 37) argued that “rudimentarily analysed visual information is available in blindsight” but typically does not lead to conscious awareness. However, such information can produce conscious awareness if the patient uses much effort and top-down control. Two findings support this approach. First, blindsight patients generally do not regard their experiences as “visual” because they differ so much from normal visual perception. Second, there is much evidence (Overgaard  & Mogensen, 2015) that blindsight patients show enhanced visual performance (and sometimes subjective awareness) after training. This occurs because they make increasingly effective use of the rudimentary visual information available to them.

Blindsight and the brain As indicated above, the main brain damage in blindsight patients is to V1 (the primary visual cortex). As we saw earlier on p. 47 in the chapter, visual processing typically proceeds from V1 (BA17) to other brain areas (e.g., V2, V3, V4; see Figure 2.4). Of importance, stimuli presented to the “blind” field often produce some activation in these other brain areas. However, this activation is not associated with visual awareness in blindsight patients. On p. 48 in the chapter we discussed research by Hurme et al. (2017) designed to clarify the role of V1 (the primary visual cortex) in the visual perception of healthy individuals. Transcranial magnetic stimulation applied to the primary visual cortex to reduce its efficiency disrupted unconscious and conscious vision. In a similar study, Hurme et al. (2019) found TMS applied to V1 prevented conscious and unconscious motion perception in healthy individuals. In view of the above findings, how is it that many blindsight patients provide evidence of unconscious visual and motion processing? Part of the answer lies within the lateral geniculate nucleus of the thalamus, an intermediate relay station between the eye and V1 (see Figure 2.28). Ajina et al. (2015) divided patients with V1 damage into those with or without blindsight. All those with blindsight had intact connections between LGN and

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 85

28/02/20 6:43 PM

86

Visual perception and attention

Figure 2.28 The areas of most relevance to blindsight are the lateral geniculate nucleus (LGN) and middle temporal visual area (MT/V5). The structure close to the LGN is the pulvinar.

Pulvinar

Do

rsa

MT/V5 V3

PLdm

PM

Plp

rea m

V2

PLvl

From Tamietto and Morrone (2016).

l st

Plm Pl cm Plcl

V1

V4 LGN TE

TEO

V3

Ve nt ra l

Superior colliculus

st re am

V2

MT/V5 (blue arrow in the figure) whereas those connections were impaired in patients without blindsight. This finding is important given the crucial importance of MT/V5 for motion perception. Celeghin et al. (2019) reported a meta-analysis (see Glossary) providing a fuller account of the brain areas associated with patients’ visual processing. They identified 14 such areas. Some of these areas (e.g., the LGN; the pulvinar) are critical for non-conscious motion perception, whereas others (e.g., superior temporal gyrus; amygdala) are involved in non-­conscious emotion processing. Overall, the meta-analysis strongly suggested that blindsight typically consists of several non-conscious visual abilities rather than one. Of interest, prefrontal areas (e.g., dorsolateral prefrontal cortex) often associated with conscious visual perception (see Chapter 16) were not activated during visual processing by blindsight patients. These findings support the view that visual processing in these patients is typically unaccompanied by conscious experience. Finally, Celeghin et al. (2019) discussed evidence that there is substantial reorganisation of brain connectivity in many blindsight patients following damage to V1 (primary visual cortex). For example, consider the blindsight patient, GY, whose left V1 was destroyed. He has nerve fibre connections between the undamaged right lateral geniculate nucleus and the contralesional (opposite side of the body) visual motion area MT/V5 (Bridge et al., 2008) – connections not present in healthy individuals. Such reorganisation helps to explain the visual abilities displayed by blindsight patients.

Evaluation Much has been learned about the nature of blindsight. First, two main types of blindsight have been identified. Second, evidence for the existence

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 86

28/02/20 6:43 PM



Basic processes in visual perception

87

of blindsight often depends on the precise measure of visual awareness used. Third, brain connections important in blindsight (e.g., between the lateral geniculate nucleus and MT/V5) have been discovered. Fourth, the visual abilities of many blindsight patients probably depend on the reorganisation of connections within the brain following damage to the primary visual cortex. Fifth, the assumption that visual processing is rudimentary in blindsight patients explains many findings. Sixth, research on blindsight has shed light on the many visual pathways that bypass V1 but whose functioning can be overshadowed by pathways involving V1 (Celeghin et al., 2019). What are the limitations of research in this area? First, there are considerable differences among blindsight patients with several apparently possessing some conscious visual awareness in their allegedly blind field. Second, many blindsight patients have more conscious visual experience in their “blind” field than appears from yes/no judgements about stimulus awareness. This probably happens because they are excessively cautious about claiming to have seen a stimulus (Mazzi et al., 2016; Overgaard et al., 2008). Third, the extent to which blindsight patients have degraded vision remains controversial. Fourth, the existence of reorganisation within the brain in blindsight patients (e.g., Bridge et al., 2008) may limit the applicability of findings from such patients to healthy individuals.

Subliminal perception In research on subliminal perception in visually intact individuals, a performance measure of perception (e.g., enhanced speed or accuracy of responding) is typically compared with an awareness measure. We can distinguish between subjective and objective measures of awareness: subjective measures involve self-reports concerning observers’ awareness, whereas objective measures involve forced-choice responses (e.g., did the stimulus belong to category A or B?) (Hesselmann, 2013). As Shanks (2017, p. 752) argued, “Unconscious processing [subliminal perception] is inferred when abovechance performance is combined with null awareness.” For example, Naccache et al. (2002) had observers decide rapidly whether a visible target digit was smaller or larger than 5. Unknown to them, an invisible masked digit on the same side of 5 as the target (congruent) or the other side (incongruent) was presented immediately before the target. There were two main findings. First, responses to the target digits were faster on congruent than incongruent trials (performance measure). Second, no participants reported seeing any masked digits (subjective awareness measure) and their performance was at chance level when guessing whether masked digits were below or above 5 (objective awareness measure). These findings suggested the existence of subliminal perception.

Findings Persaud and McLeod (2008) tested the notion that only information perceived with awareness can control our actions. They presented the letter “b” or “h” for 10 ms (short interval) or 15 ms (long interval). In the key condition, participants were instructed to respond with the letter not presented.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 87

28/02/20 6:43 PM

88

Visual perception and attention

For example, if they were aware “b” had been presented, they would say “h”. The rationale was that only participants consciously aware of the letter could inhibit saying it. Persaud and McLeod (2008) found participants responded correctly with the non-presented letter on 83% of long-interval trials indicating reasonable conscious awareness. In contrast, participants responded correctly on only 43% of short-interval trials (significantly below chance) suggesting some stimulus processing but an absence of conscious awareness. An important issue is whether perceptual awareness is all-or-none (i.e., present or absent) or graded (i.e., varying in extent). Evidence ­suggesting it is graded was reported by Sandberg et al. (2010). One of four shapes was presented very briefly followed by masking. Observers made a behavioural response (deciding which shape had been presented) followed by one of three subjective measures: (1) clarity of perceptual experience (the Perceptual Awareness Scale); (2) confidence in their decision; and (3) wagering variable amounts of money on having made the correct decision. What did Sandberg et al. (2010) find? First, above-chance task performance sometimes occurred without reported awareness with all three subjective measures. Second, the Perceptual Awareness Scale predicted performance better than the other measures, probably because it was the most sensitive measure of conscious experience. The partial awareness hypothesis (Kouider et al., 2010) potentially explains graded perceptual experience. According to this hypothesis, perceptual awareness can be limited to low-level features (e.g., colour) while excluding high-level features (e.g., face identity). Supportive evidence was reported by Gelbard-Sagiv et al. (2016) with faces coloured blue or green. They used continuous flash suppression (CFS): a stimulus presented to one eye cannot be seen consciously when rapidly changing patterns are presented to the other eye. Observers often had conscious awareness of the colour of faces they could not identify. Koivisto and Grassini (2016) presented stimuli to one of four locations. Observers then made a forced-choice responses concerning the stimulus location and rated their subjective visual awareness of the stimulus on a 3-point version of the Perceptual Awareness Scale (discussed above). Of central importance was the no-awareness category (i.e., “I did not see any stimulus”). The finding that observers were correct on 38% of trials associated with no awareness (chance performance = 25%) was apparent evidence for subliminal perception. However, there is an alternative explanation. According to Koivisto and Grassini (2016, p. 241), the above finding occurred mainly when “observers were very weakly aware of the stimulus, but behaved c­onservatively and claimed not having seen it”. This conservatism is known as response bias. Two findings supported this explanation. First, nearly all the observers showed response bias on no-awareness trials (see Figure 2.29). Second, Koivisto and Grassini (2016) used event-related potentials. The N200 (a negative wave 200 ms after stimulus presentation) is typically substantially larger for stimuli associated with awareness. Of key importance, the N200 was greater on no-awareness correct trials than

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 88

28/02/20 6:43 PM



89

Basic processes in visual perception

Figure 2.29 The relationship between response bias in reporting conscious awareness (C) and enhanced N200 on no-awareness correct trials compared to no-awareness incorrect trials (UC).

–6 r = –0.53

UC (µV)

–4

From Koivisto and Grassini (2016). Reprinted with permission of Elsevier.

–2

0

2 –0.5

0

0.5

1

1.5

C

no-awareness incorrect trials for observers with high-response bias but not those with low-response bias (see Figure 2.29). In sum, Koivisto and Grassini (2016) provided a coherent explanation for the finding that visual performance was well above chance on no-awareness trials. Observers often had weak conscious awareness on correct no-awareness trials (indicated by the N200 findings). Such weak conscious awareness occurred most frequently among those most biased against claiming to have seen the stimulus. Neuroimaging research has consistently shown that stimuli of which the observers are unaware nevertheless produce activation in several brain areas. In one study (Rees, 2007), activation was assessed in brain areas associated with face processing and with object processing while invisible pictures of faces or houses were presented. The identity of the picture (face vs house) could be predicted with almost 90% accuracy from patterns of brain activation. Thus, subliminal stimuli can be processed reasonably thoroughly by the visual system. Research focusing on differences in brain activation between conditions where there is (or is not) conscious perceptual awareness is discussed thoroughly in Chapter 16. Here we will mention two major findings. First, there is much less integrated or synchronised brain activation when there is no conscious perceptual awareness (e.g., Godwin et al., 2015; Melloni et al., 2007). Second, activation of areas within the prefrontal cortex (involved in integrating brain activity) is much greater for consciously perceived visual stimuli than those not consciously perceived (e.g., Gaillard et al., 2009; Godwin et al., 2015). What do these findings mean? They strongly suggest processing is predominantly limited to low-level features (e.g., colour; motion) when stimuli are not consciously perceived, which is consistent with the partial ­awareness hypothesis (Kouider et al., 2010).

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 89

28/02/20 6:43 PM

90

Visual perception and attention

Evaluation Evidence for unconscious or subliminal perception has been reported in numerous studies using numerous tasks. Some evidence is behavioural (e.g., Naccache et al., 2002; Persaud & McLeod, 2008) and some is based on patterns of brain activity (e.g., Melloni et al., 2007; Rees, 2007). The latter line of research suggests there can be considerable low-level processing of visual stimuli in the absence of conscious visual awareness. In spite of limitations of research in this area (see below), there is reasonably strong evidence for subliminal perception. What are the limitations of research on subliminal perception? First, measures of conscious awareness vary in sensitivity. As a consequence, it is relatively easy for researches to apparently demonstrate the existence of subliminal perception by using an insensitive measure (Rothkirch & Hesselmann, 2017). Second, many researchers focus on observers whose verbal reports show a lack of awareness. That would be appropriate if such reports were totally reliable. However, such reports are somewhat unreliable meaning that some of them would report awareness if they provided a second verbal report (Shanks, 2017). In addition, limitations of attention and memory may sometimes cause observers’ reports to omit some of their conscious experience from verbal reports (Lamme, 2010). Third, many claimed demonstrations of subliminal perception are flawed because of the typical failure to consider and/or control response bias (Peters et al., 2016). In essence, observers with response bias may claim to have no conscious awareness of visual stimuli when they actually have partial awareness (Koivisto and Grassini, 2016). Fourth, Breitmeyer (2015) identified 24 different methods used to make visual stimuli inaccessible to visual awareness. Neuroimaging and other techniques have been used to estimate the amount of unconscious processing associated with each method. Some methods (e.g., object-substitution masking: a visual stimulus is replaced by dots surrounding it) are associated with much more unconscious processing than others (e.g., binocular rivalry, see Glossary). Of key relevance here, the likelihood of obtaining evidence for subliminal perception depends substantially on the method used to suppress visual awareness.

CHAPTER SUMMARY •

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 90

Vision and the brain. In the retina, there are cones (specialised for colour vision) and rods (specialised for motion detection). The retina-geniculate-striate pathway between the eye and cortex is divided into partially separate P and M pathways. The dorsal stream (associated with the M pathway) terminates in the parietal cortex and the ventral stream (associated with the P pathway) terminates in the inferotemporal cortex. There are ­numerous i­nteractions between the two pathways and the two streams.

28/02/20 6:43 PM



Basic processes in visual perception

91

According to Zeki’s functional specialisation theory, different cortical areas are specialised for different visual functions (e.g., form; colour; motion). This is supported by findings from patients with selective visual deficits (e.g., achromatopsia; akinetopsia). However, much visual processing depends on large brain networks rather than specific areas and Zeki de-emphasised the importance of top-down (recurrent) processing. It remains unclear how we integrate the outputs of different visual processes (the binding problem). However, selective attention, synchronised neural activity and combining bottom-up (feedforward) processing and top-down (recurrent) processing all play a role. •

Two visual systems: perception-action model. Milner and Goodale identified a vision-for-perception system based on the ventral stream and a vision-for-action system based on the dorsal stream. There is limited (and inconsistent) support for the predicted double dissociation between patients with optic ataxia (damage to the dorsal stream) and visual form agnosia (damage to the ventral stream). Illusory effects found when perceptual judgements are made (ventral stream) are often much reduced when grasping or pointing responses are used (dorsal stream). However, such findings are often hard to interpret, and visually guided action often relies more on the ventral stream than acknowledged theoretically. More generally, the two visual systems interact with each other much more than previously assumed and there are probably more than two visual pathways.



Colour vision. Colour vision helps us detect objects and make fine discriminations among them. According to dual-process theory, there are three types of cone receptors and three types of opponent processes (green-red; blue-yellow; white-black). This theory explains negative afterimages and colour deficiencies but is oversimplified. Colour constancy occurs when a surface’s perceived colour remains the same when the illuminant changes. Colour constancy is influenced by our ability to assess the illuminant accurately; local colour contrast; familiarity of object colour; chromatic adaptation; and cone-excitation ratios. Most theories are more applicable to colour vision with simple artificial stimuli than complex objects in the natural world.



Depth perception. There are numerous monocular cues to depth (e.g., linear perspective; texture; familiar size) plus oculomotor and binocular cues. Cues are sometimes combined additively in depth perception. However, more weight is generally given to reliable cues than unreliable ones with weightings changing if a cue’s reliability alters. However, one cue often dominates all others when different cues conflict strongly. It is often assumed that observers generally combine cues near-optimally, but it is hard to

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 91

28/02/20 6:43 PM

92

Visual perception and attention

define “optimality”. The assumption that observers process several independent cues prior to integrating all the information is probably wrong in natural environments providing rich information about overall environmental structure. Size perception is sometimes strongly influenced by perceived distance as predicted by the size-distance invariance hypothesis. However, the impact of familiar size on depth perception cannot be explained by that hypothesis. More generally, perceived size and perceived distance often depend on different factors. •

Perception without awareness: subliminal perception. Patients with extensive damage to V1 sometimes suffer from blindsight. This is a condition involving some ability to respond to visual stimuli in the absence of normal conscious visual awareness (especially motion detection). There is no conscious awareness in type-1 blindsight but some residual awareness in type-2 blindsight. Blindsight patients are sometimes excessively cautious when reporting their conscious experience. The visual abilities of some blindsight patients probably depend on reorganisation of brain connections following brain damage. There is much behavioural and neuroimaging evidence for subliminal perception in visually intact individuals. However, there are problems of interpretation caused by insensitive (and unreliable) measures of self-reported awareness. Some observers may show apparent subliminal perception because they have a response bias leading them to claim no conscious awareness of visual stimuli of which they actually have limited awareness.

FURTHER READING Brenner, E. & Smeets, J.B.J. (2018). Depth perception. In J.T. Serences (ed.), Stevens’ Handbook of Experimental Psychology and Cognitive Neuroscience, Vol. 2: Sensation, Perception, and Attention (4th edn; 385–414). New York: Wiley. The authors provide a comprehensive account of theory and research on depth perception. de Haan, E.H.F., Jackson, S.R. & Schenk, T. (2018). Where are we now with “what” and “how”? Cortex, 98, 1–7. Edward de Haan and his colleagues provide an evaluation of the perception-action model Goldstein, E.B. & Brockmole, J. (2017). Sensation and Perception (10th edn). Boston: Cengage. There is coverage of key areas within visual perception in this introductory textbook. Naccache, L. (2016). Chapter 18: Visual consciousness: A “re-updated” neurological tour. Neurology of Consciousness (2nd edn; pp. 281–295). Lionel Naccache provides a theoretical framework within which to understand blindsight and other phenomena associated with visual consciousness. Shanks, D.R. (2017). Regressive research: The pitfalls of post hoc data selection in the study of unconscious mental processes. Psychonomic Bulletin & Review, 24, 752–775. David Shanks discusses some issues relating to research claiming to provide evidence for subliminal perception.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 92

28/02/20 6:43 PM



Basic processes in visual perception

93

Tang, F. (2018). Foundations of vision. In J.T. Serences (ed.), Stevens’ Handbook of Experimental Psychology and Cognitive Neuroscience, Vol. 2: Sensation, Perception, and Attention (4th edn; pp. 1–62). New York: Wiley. Frank Tong provides a comprehensive account of the visual system and its workings. Witzel, C. & Gegenfurtner, K.R. (2018). Colour perception: Objects, constancy, and categories. Annual Review of Vision Science, 4, 475–499. Christoph Witzel and Karl Gegenfurtner discuss our current knowledge of colour perception.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 93

28/02/20 6:43 PM

Chapter

3

Object and face recognition

INTRODUCTION Tens of thousands of times every day we identify or recognise objects in the world around us. At this precise moment, you are looking at this book. If you raise your eyes, perhaps you can see a wall and windows. Object recognition typically happens so effortlessly it is hard to believe it is actually a complex achievement. Evidence of its complexity comes from numerous unsuccessful attempts to program computers to “perceive” the environment. However, computer programs that are reasonably effective at recognising complicated two-dimensional patterns have been developed. Why is visual perception so complex? First, objects often overlap and so we must decide where one object ends and the next one starts. Second, numerous objects (e.g., chairs; trees) vary enormously in their visual ­properties (e.g., colour; size; shape) and so it is hard to assign such diverse stimuli to the same category. Third, we recognise objects almost regardless  of orientation (e.g., we can easily identify a plate that appears elliptical). We can go beyond simply identifying objects. For example, we can generally describe what an object would look like from different angles, and we also know its uses and functions. All in all, there is much more to object recognition than might be supposed (than meets the eye?). What is discussed in this chapter? The overarching theme is to unravel the mysteries associated with recognising three-dimensional objects. However, we initially discuss how two-dimensional patterns are recognised. Then the focus shifts to how we decide which parts of the visual world belong together and thus form separate objects. This is a crucial early stage in object recognition. After that, general theories of object recognition are evaluated against the available neuroimaging and behavioural evidence. Face recognition (vitally important in our everyday lives) differs in important ways from object recognition. Accordingly, we discuss face recognition in a separate section. Finally, we consider whether the processes involved in visual imagery resemble those involved in visual perception.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 94

28/02/20 6:43 PM



Other issues relating to object recognition (e.g., depth perception; size constancy) were discussed in Chapter 2.

PATTERN RECOGNITION

95

Object and face recognition

KEY TERM Pattern recognition The ability to identify or categorise twodimensional patterns (e.g., letters; fingerprints).

We spend much of our time (e.g., when reading) engaged in pattern ­recognition – the identification or categorisation of ­ two-dimensional ­patterns. Much research has considered how alphanumeric patterns (alphabetical and numerical symbols) are recognised. A key issue is the flexibility of the human perceptual system (e.g., we can recognise the letter “A” rapidly and across wide variations in orientation, typeface, size and writing style). Patterns can be regarded as consisting of a set of specific features or attributes (Jain & Duin, 2004). For example, the key features of the letter “A” are two straight lines and a connecting cross-bar. An advantage of this feature-based approach is that visual stimuli varying greatly in size, orientation and minor details can be identified as instances of the same pattern. Many feature theories assume pattern recognition involves processing specific features followed by more global or general processing to integrate feature information. However, Navon (1977) argued global processing often precedes more specific processing. He presented observers with stimuli such as the one shown in Figure 3.1. On some trials, they decided whether the large letter was an “H” or an “S”; on others, they decided Interactive exercise: Navon whether the small letters were Hs or Ss. Navon (1977) found performance speed with the small letters was greatly slowed when the large letter differed from the small letters. However, decision speed with the large letters was uninfluenced by the nature of the small letters. Navon concluded we often see the forest (global structure) before the trees (features). There are limitations with Navon’s (1977) research and conclusions. First, Dalrymple et al. (2009) found performance was faster at the level of the small letters than the large letter when the small letters were relatively large and spread out. Thus, attentional processes influence performance. Second, Navon failed to distinguish adequately between encoding (neuronal responses triggered by visual stimuli) and decoding (conscious perception of those stimuli) (Ding et al., 2017). Encoding typically progresses from lower-level representations of simple features to higher-level representations of more complex features (Felleman & Van Essen, 1991). In contrast, Ding et al. (2017, p. E9115) found, “The brain prioritises de-­ coding of higher-level features because they are . . . more invariant and categorical, and thus easier to . . . maintain in noisy working memory.” Thus, Navon’s (1977) conclusions may be more applicable to visual decoding Figure 3.1 (conscious perception) than the preceding The kind of stimulus used by Navon (1977) to demonstrate the importance of global features in perception. internal neuronal responses.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 95

28/02/20 6:43 PM

96

Visual perception and attention

Feature detectors If presentation of a visual stimulus leads to detailed processing of its basic features, we should be able to identify cortical cells involved in such processing. Hubel and Wiesel (1962) studied cells in parts of the occipital cortex involved in visual processing. Some cells responded in two different ways to a spot of light depending on which part of the cell was affected: (1) An “on” response with an increased rate of firing when the light was on. (2) An “off” response with the light causing a decreased rate of firing. Hubel and Wiesel (e.g., 1979) discovered two types of neuron in the primary visual cortex: simple cells and complex cells. Simple cells have “on” and “off” rectangular regions. These cells respond most to dark bars in a light field, light bars in a dark field, or straight edges between areas of light and dark. Any given cell responds strongly only to stimuli of a particular orientation and so its responses could be relevant to feature detection. Complex cells resemble simple cells in responding maximally to straight-line stimuli in a particular orientation. However, complex cells have large receptive fields and respond more to moving contours. Each complex cell is driven by several simple cells having the same orientation preference and closely overlapping receptive fields (Alonso & Martinez, 1998). There are also end-stopped cells. Their responsiveness depends on stimulus length and orientation. In sum, Hubel and Wiesel envisaged, “A hierarchically organised visual system in which more complex visual features are built (bottom-up) from more simple ones” (Ward, 2015, p. 111). Hubel and Wiesel’s account is limited in several ways: (1) The cells they identified provide ambiguous information because they respond comparably to different stimuli (e.g., a horizontal line moving rapidly and a nearly horizontal line moving slowly). Observers must combine information from numerous neurons to remove ambiguities. (2) Neurons differ in their responsiveness to different spatial frequencies and several phenomena in visual perception depend on this differential responsiveness (discussed on pp. 104–105). (3) As Schulz et al. (2015, p. 1022) pointed out, “The responses of cortical neurons [in the primary visual cortex] to repeated presentations of a stimulus are highly variable.” This variability complicates pattern recognition. (4) Pattern recognition and object recognition depend on top-down processes triggered by expectations and context (e.g., Goolkasian & Woodberry, 2010; discussed on pp. 111–116) as well as on the ­bottom-up processes emphasised by Hubel and Wiesel.

PERCEPTUAL ORGANISATION Our visual environment is typically complex and confusing with many objects overlapping others, thus making it hard to achieve perceptual segregation of visual objects. How this is done was first studied systematically

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 96

28/02/20 6:43 PM



Object and face recognition

97

IN THE REAL WORLD: HOW CAN WE DISCOURAGE SPAMMERS? Virtually everyone has received a substantial amount of spam (unwanted KEY TERM emails). Spammers use bots (robots running automated tasks over CAPTCHA the internet) to send emails to thousands of individuals for various A Completely Automated ­money-making purposes (e.g., fake sweepstake entries). Turing Test to tell A CAPTCHA (Completely Automated Turing test to tell Computers Computers and Humans and Humans Apart) is commonly used to discourage spammers. The Apart involving distorted characters connected intention is to ensure a website user is human by providing a test together is often used humans can solve but automated computer-based systems cannot. The to establish that the user CAPTCHA in Figure 3.2 is typical in consisting of distorted characters of an internet website connected together horizontally. In principle, the study of CAPTCHAs is human rather than an can shed light on the strengths of human pattern recognition. automated system. Computer programs to solve CAPTCHAs generally involve a segmentation phase to locate the characters followed by a recognition phase where each character is identified. Many computer programs can recognise individual characters even when very distorted but their performance is much worse at segmenting connected characters. Overall, the performance of most computer programs at solving CAPTCHAs Figure 3.2 The CAPTCHA used by Yahoo. was poor until fairly recently. Nachar et al. (2015) devised a computer From Gao et al. (2012). program focusing on edge corners (an edge corner is the intersection of two straight edges). Such corners are relatively unaffected by the distortions and overlaps of characters found in CAPTCHAs. Nachar et al.’s approach proved successful, allowing them to solve 57% of CAPTCHAs resembling the one shown in Figure 3.2. There are two take-home messages. First, the difficulties encountered in devising computer programs to solve CAPTCHAs indicate humans have excellent pattern-recognition abilities. Second, edge corners provide an especially valuable source of information in pattern recognition. Of relevance, successful camouflage in many species depends heavily on markings that break up an animal’s edges, making it less visible (Webster, 2015).

IN THE REAL WORLD: FINGERPRINTING An important form of real-world pattern recognition involves experts matching a criminal’s fingerprints (latent print) against stored fingerprint records. Automatic fingerprint identification systems (AFISs) scan huge databases. This typically produces a small number of possible matches to the fingerprint obtained from the crime scene ranked by similarity to the criminal’s fingerprint. Experts then decide which database fingerprint (if any) matches the criminal’s. We might imagine experts are much better at fingerprint matching than novices because their analytic (slow, deliberate) processing is superior. However, Thompson and Tangen (2014) found experts greatly outperformed novices when pairs of fingerprints were presented for only 2 seconds, forcing them to rely heavily on non-analytic (fast and relatively “automatic”) processing. However, when fingerprint pairs were presented for 60 seconds, experts showed a greater performance

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 97

28/02/20 6:43 PM

98

Visual perception and attention

improvement than novices (19% vs 7%, respectively). Thus, experts have superior analytic and non-analytic processing. According to signal-detection theory, experts may surpass novices in their ability to discriminate between matching and non-matching prints. Alternatively, they may simply have a more lenient response bias than novices. If so, they would tend to respond “match” to every pair of prints. Good discrimination is associated with many “hits” (responding “match” on match trials) plus a low false-alarm rate (not responding “match” on non-match trials). In contrast, a lenient response criterion is associated with many false alarms. Thompson et al. (2014) found novices made false alarms on 57% of trials on which two prints were similar but did not match, whereas experts did so on only 1.65% of trials. Thus, experts have a much more conservative response criterion as well as much better discrimination between matching and non-matching prints. It is often assumed expert fingerprint identification is very accurate. However, experts listing the minutiae (features) on fingerprints on two occasions showed total agreement between their assessments only 16% of the time (Dror et al., 2012). Nevertheless, experts are much less likely than non-experts to decide incorrectly that two fingerprints from the same person are from different individuals (Champod, 2015). Fingerprint identification is often complex. As an example, try to decide whether the fingerprints in Figure 3.3 come from the same person. Four fingerprinting experts said the fingerprint on the right was from the same person as the one on the left (Ouhane Daoud, the bomber involved in the terrorist attack in Madrid on 11 March 2004). In fact, the one on the right came from Brandon Mayfield, an American lawyer who was falsely arrested. Experts’ mistakes are often due to the incompleteness of the fingerprints found at crime scenes. However, top-down processes also contribute. Experts’ errors often involve forensic confirmation bias: “an individual’s pre-existing beliefs, expectations, motives, and situational context influence the collection, perception, and interpretation of evidence” (Kassin et al., 2013, p. 45). Dror et al. (2006) found evidence of forensic confirmation bias. Experts were asked to judge whether two fingerprints matched having been told, incorrectly, that they were the ones mistakenly matched by the FBI as the Madrid bomber. In fact, these experts had judged these fingerprints to be a clear and definite match several years earlier. The misleading information provided led 60% of them to judge the prints to be definite non-matches! Thus, top-down processes triggered by contextual information can distort fingerprint identification. Langenburg et al. (2009) studied the effects of context (e.g., alleged conclusions of internationally respected experts) on fingerprint identification. Experts and non-experts were both influenced by contextual information (and so showed confirmation bias). However, non-experts were influenced more. The above studies on confirmation bias manipulated context very directly and explicitly. Searston et al. (2016) found a more subtle context effect based on familiarity. Novice parFigure 3.3 ticipants were presented initially with a series The FBI’s mistaken identification of the Madrid bomber. of cases and fingerprint pairs and given feed- The fingerprint from the crime scene is on the left. The back as to whether the ­fingerprints matched fingerprint of the innocent suspect (positively identified by or not. Then they were presented with various fingerprint experts) is on the right. cases very similar to those seen previously From Dror et al. (2006). Reprinted with permission from Elsevier.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 98

28/02/20 6:43 PM



99

Object and face recognition

and decided whether the fingerprint pairs matched. The participants exhibited response bias during the second part of the experiment: their decisions (i.e., match or no-match) tended to correspond to the correct decisions associated with similar (but not identical) cases encountered earlier. In sum, experts typically outperform novices at fingerprint matching because they have superior discrimination ability and a more conservative response criterion. However, even experts are influenced by irrelevant or misleading contextual information and often show evidence of confirmation bias. Worryingly, among forensic experts (including fingerprinting experts), only 52% regarded bias as a matter for concern and even fewer (26%) believed their own judgements were influenced by bias.

by the gestaltists, German psychologists (including Koffka, Köhler and Wertheimer) who emigrated to the United States between the two World Wars. Their fundamental principle was the law of Prägnanz – we typically perceive the simplest possible organisation of the visual field. Most of the gestaltists’ other laws can be subsumed under the law of Prägnanz. Figure 3.4(a) illustrates the law of proximity (visual elements close in space tend to be grouped together). Figure 3.4b shows the law of similarity (similar elements tend to be grouped together). We see two crossing lines in Figure 3.4(c) because, according to the law of law continuation, we group together those elements requiring the fewest changes or interruptions in straight or smoothly curving lines. Finally, Figure 3.4(d) illustrates the law of closure: the missing parts of a figure are filled in to complete the figure (here a circle). We might dismiss these principles as “mere textbook curiosities” (Wagemans et al., 2012a, p. 1180). However, the various grouping principles “pervade virtually all perceptual experiences because they determine the objects and parts that people perceive in their environment” (Wagemans et al., 2012a, p. 1180). The gestaltists emphasised figure-ground segmentation in perception. The figure is perceived as having a distinct form or shape whereas the ground lacks form. In addition, the figure is perceived as being in

KEY TERMS Law of Prägnanz The notion that the simplest possible organisation of the visual environment is perceived; proposed by the gestaltists. Figure-ground segmentation The perceptual organisation of the visual field into a figure (object of central interest) and a ground (less important background).

Figure 3.4 Examples of the Gestalt laws of perceptual organisation: (a) the law of proximity; (b) the law of similarity; (c) the law of good continuation; and (d) the law of closure.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 99

28/02/20 6:43 PM

100

Visual perception and attention

Figure 3.5 An ambiguous drawing that can be seen as either two faces or as a goblet.

front of the ground and the contour separating the figure from ground “belongs” to the figure. Check these claims with the ­faces-goblet illusion (see Figure  3.5). When the goblet is  ­perceived as the figure, it seems to be in front of a dark background. Faces are in front of a light background when forming the figure. What determines which region is identified as the figure and which as the ground? Regions that are convex (curving outwards), small, surrounded and symmetrical are most likely to be perceived as figures (Wagemans et al., 2012a). For example, Fowlkes et al. (2007) found with images of natural scenes that regions identified by observers as figures were generally smaller and more convex than ground regions. Finally, the gestaltists argued perceptual grouping and organisation are innate or intrinsic to the brain. As a result, they de-­ emphasised the importance of past experience.

Findings The gestaltists’ approach was limited because they mostly used artificial figures, making it important to see whether their findings apply to more realistic stimuli. Geisler et al. (2001) used pictures to study the contours of flowers, rivers, trees and so on. They discovered object contours could be calculated accurately using two principles that were different from those emphasised by the gestaltists: (1)  Adjacent segments of any contour typically have very similar orientations. (2) Segments of any contour that are further apart generally have somewhat different orientations. Geisler et al. (2001) asked observers to decide which of two complex patterns presented together contained a winding contour. Task performance was well predicted by the two key principles described above. Elder and Goldberg (2002) analysed the statistics of natural contours and obtained findings largely consistent with Gestalt laws. Proximity was a very p ­ owerful cue when deciding which contours belonged to which objects. There was also a small contribution from similarity and good continuation. Numerous cues influence figure-ground segmentation and the perception of object boundaries with natural scenes. Mély et al. (2016) found colour and luminance (see Glossary) strongly influenced the perception of object boundaries. There was more accurate perception of object boundaries when several cues were combined than that found for any single cue in isolation.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 100

28/02/20 6:43 PM



101

Object and face recognition

In sum, there is some support for Gestalt laws in natural scene perception. However, figure-ground segmentation is more complex in natural scenes than most artificial figures, and so the Gestalt approach is oversimplified. The gestaltists failed to discover several principles of perceptual organisation. For example, Palmer and Rock (1994) proposed the principle of uniform connectedness. According to this principle, any connected region having uniform visual properties (e.g., colour; texture; lightness) tends to be organised as a single perceptual unit. Palmer and Rock found grouping by uniform connectedness dominated proximity and similarity when there was a conflict. Pinna et al. (2016) argued that the gestaltists de-emphasised the role of dissimilarity in perceptual organisation. Consider Figure 3.6. The perception of the empty circles as a rotated square or a diamond is strongly influenced by the location of the dissimilar element (i.e., the black circle). This illustrates the principle of accentuation: “Elements group in the same oriented direction of the dissimilar element placed . . . outside a whole set of continuous/homogeneous components” (Pinna et al., 2016, p. 21). Much processing involved in perceptual organisation occurs very rapidly. Williford and von der Heydt (2016) discovered signals from neurons in V2 (see Chapter 2) relating to figure-ground organisation emerged within 70 ms of stimulus presentation for complex natural scenes as well as for simple figures. This extremely rapid processing is ­consistent with the gestaltists’ assumption that perceptual organisation is due to innate factors but may also reflect massive experience in object recognition. The role of learning was discussed by Bhatt and Quinn (2011). Infants as young as 3 or 4 months show grouping by continuation, proximity and connectedness, which is apparently consistent with the Gestalt position. However, other grouping principles (e.g., closure) were used only later in infancy, and infants typically made increased use of grouping principles over time. Thus, learning is important.

A

KEY TERM Uniform connectedness The notion that adjacent regions in the visual environment having uniform visual properties (e.g., colour) are perceived as a single perceptual unit.

B

Figure 3.6 The dissimilar element (black circle) accentuates the tendency to perceive the array of empty circles as (A) a rotated square or (B) a diamond. From Pinna et al., 2016.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 101

28/02/20 6:43 PM

102

Visual perception and attention

According to the gestaltists, perceptual grouping occurs rapidly and should be uninfluenced by attentional processes. The evidence is mixed. Rashal et al. (2017) conducted several experiments. Attention was not required with grouping by proximity or similarity in colour. However, attention was required with grouping by similarity in shape. In general, attention was more likely to be required when the processes involved in perceptual grouping were relatively complex. Overall, the processes involved in perceptual grouping are much more complicated and variable than the gestaltists had assumed. The gestaltists also assumed figure-ground segmentation is innate and so not reliant on past experience or learning. Barense et al. (2012) reported contrary evidence. Amnesic patients (having severe memory problems) and healthy controls were presented with various stimuli, some containing parts of well-known objects (see Figure 3.7). In other stimuli, the object parts were rearranged. The task was to decide which region of each stimulus was the figure. The healthy controls identified the regions containing familiar objects as figures more often than those containing rearranged parts. In contrast, the amnesic patients showed no difference between the two types of stimuli because they experienced difficulty in identifying the objects presented. Thus, figure-ground segmentation can depend on past experience and memory (i.e., object familiarity). Experimental stimuli: Intact familiar configurations Several recent theories explain perceptual grouping and figure-ground segmentation. For example, consider Froyen et  al.’s (2015) Bayesian hierarchical grouping model, according to which observers initially form “beliefs” concerning the objects to be expected in the current context. In addition, their visual system assumes the visual image ­consists of a mixture of objects. The information availControl stimuli: Part-rearranged novel configurations able in the image is then used to change the subjective probabilities of different grouping hypotheses to make optimal use of that information. Of key importance, observers use their learned knowledge of patterns and objects (e.g., visual elements close  together generally belong to the same object). The above approach exemplifies theories based on Bayesian inference (see Glossary). Their central assumption is that the initial subjective probabilities associated with vari­ Figure 3.7 ous hypotheses as to the organisation of The top row shows intact familiar shapes (from left to right: objects within a visual  image change on the a guitar, a standing woman, a table lamp). The bottom row basis of the information it provides. This shows the same objects but with the parts rearranged. The approach is much  more realistic than the task was to decide which region in each stimulus was the gestaltists’ relatively cut-and-dried approach. figure. From Barense et al. (2012). Reprinted with permission of Oxford University Press.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 102

28/02/20 6:43 PM



Object and face recognition

103

Evaluation What are the strengths of the Gestalt approach? First, the gestaltists focused on key issues (e.g., figure-ground segmentation). Second, nearly all their grouping laws (and the notion of figure-ground segmentation) have stood the test of time and are applicable to natural scenes as well as artificial figures. Third, the notion that observers perceive the simplest possible organisation of the visual environment has proved very fruitful. Many recent theories are based on the assumption that striving for simplicity is central to visual perception (Jäkel et al., 2016). What are the approach’s limitations? First, the gestaltists de-­ emphasised the importance of past experience and learning. As Wagemans et al. (2012b, p. 1229) pointed out, the gestaltists “focused almost exclusively on processes intrinsic to the perceiving organism . . . The environment itself did not interest [them]”. Second, the gestaltists produced descriptions of important perceptual phenomena but not adequate explanations. Recently, however, such explanations have been provided. Third, nearly all the evidence the gestaltists provided was based on two-dimensional drawings. The greater complexity of real-world scenes (e.g., important parts of objects hidden or occluded) means additional explanatory assumptions are required. Fourth, the gestaltists did not discover all the principles of perceptual organisation. Among such undiscovered principles are uniform connectedness, the principle of accentuation and generalised common ­ fate (e.g., when elements of a visual scene become brighter or darker together, they are grouped together). More generally, the gestaltists did not ­appreciate the sheer complexity of the processes involved in perceptual grouping. Fifth, the gestaltists focused mostly on drawings involving only one Gestalt law. With natural scenes, several laws often operate simultaneously and interact in complex ways not predicted by the gestaltists (Jäkel et al., 2016). Sixth, the gestaltists’ approach was too inflexible. They did not realise perceptual grouping and figure-ground segregation depend on complex interactions between basic (and possibly innate) processes and past experience (Rashal et al., 2017).

APPROACHES TO OBJECT RECOGNITION Object recognition (identifying objects in the visual field) is enormously important if we are to interact effectively with the environment. We start with basic aspects of the human visual system followed by major theories of object recognition.

Perception-action model Milner and Goodale’s (1995, 2008) perception-action model (discussed in Chapter 2) is relevant to understanding object perception. It is based on a distinction between ventral (or “what”) and dorsal (or “how”) streams (see

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 103

28/02/20 6:43 PM

104

Visual perception and attention

Figure 2.9), with the latter providing visual guidance for action (e.g., grasping). They argued object recognition and perception depend primarily on the ventral stream. This stream is hierarchically organised. Visual processing basically proceeds from the retina through several areas including the lateral geniculate nucleus, V1, V2 and V4, culminating in the inferotemporal cortex. The importance of the ventral stream is indicated by research showing object recognition can be reasonably intact after damage to the dorsal stream (Goodale & Milner, 2018). However, object recognition involves numerous interactions between the ventral and dorsal streams (Freud et al., 2017b).

Spatial frequency Visual perception develops over time even though it seems instantaneous (Hegdé, 2008). The visual processing involved in object recognition typically proceeds in a coarse-to-fine way with initial coarse or general processing followed by fine or detailed processing. As a result, we can perceive visual scenes at a very general level and/or at a fine-grained level. How does coarse-to-fine processing occur? Numerous cells in the primary visual cortex respond to high spatial frequencies and capture fine detail in the visual image. Numerous others respond to low spatial frequencies and capture coarse information in the visual image. Low spatial frequency information (often relating to motion and/ or spatial location) is transmitted rapidly to higher-order brain areas via the fast magnocellular system using the dorsal visual stream (discussed in Chapter 2). Awasthi et al. (2016) used red light to produce magnocellular suppression. As predicted, this interfered with the low spatial frequency components of face perception. In contrast, high spatial frequency information (often relating to colour, shape and other aspects of object recognition) is transmitted relatively slowly via the parvocellular system using the ventral visual stream (see Chapter 2). This speed difference explains why coarse processing typically precedes fine processing, although conscious perception is typically based on integrated low and high spatial information. We can observe the effects of varying spatial frequency by comparing images consisting only of low or high spatial frequency (see Figure 3.8). You probably agree it is considerably easier to achieve object recognition with the high spatial frequency image.

Findings

Figure 3.8 High and low spatial frequency versions of a place (a building). From Awasthi et al. (2016).

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 104

Musel et al. (2012) presented participants with very brief (150 ms) scenes proceeding from coarse (low spatial frequency) to fine (high spatial ­frequency) or vice versa (sample videos can be viewed at DOI.10.1371/ journal.pone.003893). Performance (deciding whether each scene was an outdoor or indoor

28/02/20 6:43 PM



Object and face recognition

105 Figure 3.9 Image of Mona Lisa revealing very low spatial frequencies (left), low spatial frequencies (centre) and high spatial frequencies (right). From Livingstone (2000). By kind permission of Margaret Livingstone.

one) was faster with the coarse-to-fine sequence, a finding subsequently replicated by Kauffmann et al. (2015). These fi ­ ndings  suggest the visual processing of natural scenes is predominantly coarse to fine. This sequence may be more effective because low spatial frequency information is used to generate plausible interpretations of the visual input. There is considerable evidence that global (general) processing often precedes local (specific) processing (discussed on p. 95). Much research has established an association between processing low spatial frequencies and global perception and between processing high spatial frequencies and local perception. However, use of low and high spatial frequency information in visual processing is often very flexible and is influenced by task demands. Flevaris and Robertson (2016, p. 192) reviewed research showing, “Attention to global and local aspects of a display biases the flexible selection of relatively lower and relatively higher SFs [spatial frequencies] during image processing.” Finally, we can explain the notoriously elusive smile of Leonardo da Vinci’s Mona Lisa with reference to spatial frequencies. Livingston (2000) produced images of that painting with different spatial frequencies. Mona Lisa’s smile is much more obvious in the two low spatial frequency images (see Figure 3.9). Livingston pointed out that our central or foveal vision is dominated by higher spatial frequencies compared with our peripheral vision. As a result, “You can’t catch her smile by looking at her mouth. She smiles until you look at her mouth” (p. 1299).

Historical background: Marr’s computational approach David Marr (1982) proposed a very influential theory. He argued object recognition involves various processing stages and is much more complex than had previously been thought. More specifically, Marr claimed observers construct various representations (descriptions) providing increasingly detailed information about the visual environment: ●●

Primal sketch: this provides a two-dimensional description of the main light intensity changes in the visual input, including information about edges and contours.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 105

28/02/20 6:43 PM

106

Visual perception and attention ●●

●●

2½-D sketch: this incorporates a description of the depth and orientation of visible surfaces using information from shading, texture, motion and binocular disparity. It resembles the primal sketch in being viewer-centred or viewpoint-dependent (i.e., it is influenced by the ­ angle from which the observer sees objects or the environment). 3-D model representation: this describes objects’ shapes and their relative positions three-dimensionally; it is independent of the observer’s viewpoint and so is viewpoint-invariant.

Why has Marr’s theoretical approach been so influential? First, he successfully combined ideas from neurophysiology, anatomy and computer vision (Mather, 2015). Second, he was among the first to realise the enormous complexity of object recognition. Third, his distinction between viewpoint-dependent and viewpoint-invariant representations triggered ­ much subsequent research (discussed on pp. 109–111). What are the limitations of Marr’s approach? First, he focused excessively on bottom-up processes. Marr (1982, p. 101) admitted, “Top-down processing is sometimes used and necessary.” However, he de-emphasised the major role expectations and knowledge play in object recognition ­(discussed in detail on pp. 111–116). Second, Marr assumed that “Vision tells the truth about what is out there” (Mather, 2015, p. 44). In fact, there are numerous exceptions. For example, people observed from a tall building (e.g., the Eiffel Tower) seem very small. Another example is the vertical-horizontal illusion – ­observers typically overestimate the length of a vertical line when it is compared against a horizontal line of the same length (e.g., Gavilán et al., 2017). Third, many processes proposed by Marr are incredibly complex computationally. As Mather (2015, p. 44) pointed out, “The computations required to produce view-independent 3-D object models are now thought by many researchers to be too complex.”

Biederman’s recognition-by-components theory Biederman’s (1987) recognition-by-components theory developed Marr’s theoretical approach. His central assumption was that objects consist of basic shapes or components known as “geons” (geometric ions); examples include blocks, cylinders, spheres, arcs and wedges. Biederman claimed there are approximately 36 different geons, which sounds suspiciously low to provide descriptions of all objects. However, geons can be combined in almost endless ways. For example, a cup is an arc connected to the side of a cylinder. A pail involves the same two geons but with the arc connected to the top of the cylinder. Figure 3.10 shows the key features of recognition-by-components theory. We have already considered the stage where the components or geons of an object are determined. When this information is available, it is matched with stored object representations or structural models consisting of information about the nature of the relevant geons, their orientations, sizes and so on. Whichever stored representation fits best with the geonbased information obtained from the visual object determines which object is identified by observers.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 106

28/02/20 6:43 PM



As indicated in Figure 3.10, the first step in object recognition is edge extraction in which various aspects of the visual stimulus (e.g., luminance; texture; colour) are processed, leading to a description of the object resembling a line drawing. After that, decisions are made as to how the object should be segmented to establish its geons. Which edge information should observers focus on? According to Biederman (1987), non-accidental image properties are crucial. These are aspects of the visual image that are invariant across different viewing angles. Examples include whether an edge is straight or curved and whether a contour is concave (hollow) or convex (bulging) with the former of particular importance. Biederman assumed objects’ geons of a visual object are constructed from various non-accidental or invariant properties. This part of the theory leads to the key prediction that object recognition is typically viewpoint-invariant (i.e., objects can be recognised equally easily from nearly all viewing angles). The argument is that object recognition depends crucially on the identification of geons, which can be identified from numerous viewpoints. Thus, object recognition is difficult only when one or more geons are hidden from view. How do we recognise objects in suboptimal viewing conditions (e.g., an intervening object obscures part of the target object)? First, non-accidental properties can still be detected even when only parts of edges are visible. Second, if the concavities of a contour are visible, there are mechanisms for restoring the missing parts of the contour. Third, we can recognise many objects when some geons are missing because there is much redundant information under optimal viewing conditions.

Object and face recognition

107

Irving Biederman. University of Southern California.

Findings Non-accidental properties play a vital role in object recognition (Parker & Serre, 2015). For example, it is easier to distinguish between two objects differing in non-­accidental properties. In addition, neuroimaging studies

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 107

Figure 3.10 An outline of Biederman’s recognition-by-components theory. Adapted from Biederman (1987).

28/02/20 6:43 PM

108

Visual perception and attention

reveal greater neural responses to changes in non-accidental properties than other visual changes. Rolls and Mills (2018) developed a model of object recognition showing how non-accidental properties of objects can promote viewpoint-invariant object recognition. There is general agreement that an object’s contour or outline is important in object recognition. For example, camouflage in many animal species is achieved by markings breaking up and distorting contour information (Webster, 2015). There is also general agreement that concavities and convexities are especially informative regions of an object’s contour. However, the evidence relating to Biederman’s (1987) assumption that concavity is more important than convexity in object recognition is mixed (Schmidtmann et al., 2015). In their own study, Schmidtmann et al. (2015) focused specifically on shape recognition using unfamiliar shapes. Shape recognition depended more on information about convexities than concavities (although concavity information had some value). They argued convexity information is likely to be more important because convexities reveal an object’s outer boundary. According to the theory, object recognition depends on edge rather than surface information (e.g., colour). However, Sanocki et al. (1998) argued that edge-extraction processes are less likely to produce accurate object recognition when objects are presented in the context of other objects rather than on their own. This is because it can be hard to decide which edges belong to which objects when several objects are presented together. Sanocki et al. presented observers briefly with objects, with line drawings, or full-colour photographs of objects. As predicted, object recognition was much worse with the line drawings than the full-colour photographs when objects were presented in context. A key theoretical prediction is that object recognition is typically viewpoint-invariant. Biederman and Gerhardstein (1993) supported this prediction when familiar objects presented at different angles were named rapidly. However, numerous other studies have failed to obtain evidence for viewpoint-invariance. This is especially the case with unfamiliar objects differing from familiar objects in not having previously been viewed from multiple viewpoints (discussed in next section on pp. 109–111).

Evaluation Biederman’s (1987) recognition-by-components theory has been very influential. It indicates how we can identify objects despite substantial differences among the members of most categories in shape, size and orientation. The assumption that non-accidental properties of stimuli and geons play a role in object recognition has received much support. What are the theory’s limitations? First, it focuses predominantly on bottom-up processes triggered directly by the stimulus input. As a result, it de-emphasises the impact on object recognition of top-down processes based on expectation (Trapp & Bar, 2015; discussed further on pp. ­111–116). Second, the theory accounts only for fairly unsubtle perceptual discriminations. It cannot explain how we decide whether an animal is, for example, a particular breed of dog or cat. Third, the notion that

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 108

28/02/20 6:43 PM



Object and face recognition

109

objects consist of invariant geons is too inflexible. As Hayward and Tarr (2005, p. 67) pointed out, “You can take almost any object, put a working light-bulb on the top, and call it a lamp.”

Does viewpoint influence object recognition? Form a visual image of a bicycle. Your image probably involved a side view with both wheels clearly visible. We can use this example to discuss a theoretical controversy. Consider an experiment where some participants see a photograph of a bicycle in the typical (or canonical) view as in your visual image, whereas others see a photograph of the same bicycle viewed end-on or from above. Would those given the typical view identify the object as a bicycle fastest? We will address the above question shortly. Before that, we must discuss two key terms mentioned earlier. If object recognition is equally rapid and easy regardless of viewing angle, it is viewpoint-invariant. In contrast, if object recognition is faster and easier when objects are seen from certain angles, it is viewer-centred or viewpoint-dependent. Another important distinction is between categorisation (e.g., is the object a dog?) and identification (e.g., is the object a poodle?), which requires within-category discriminations.

Findings Milivojevic (2012) reviewed behavioural research in this area. Object recognition is typically uninfluenced by an object’s orientation when categorisation is required (i.e., it is viewpoint-invariant). In contrast, object recognition is significantly slower if an object’s orientation differs from its canonical or typical viewpoint when identification is required (i.e., it is viewer-centred). Hamm and McMullen (1998) reported supporting findings. Changes in viewpoint had no effect on speed of object recognition when categorisation was required (e.g., deciding an object was a car). However, there were clear effects of changing viewpoint with identification (e.g., deciding whether an object was a taxi). Small (or non-significant) effects of object orientation on categorisation time do not necessarily indicate orientation has not affected internal processing. Milivojevic et al. (2011) found stimulus orientation had only small effects on speed and accuracy of categorisation. However, early components of the event-related potentials (ERPs; see Glossary) were larger when stimuli were not in the upright position. Thus, stimulus orientation had only modest effects on task performance but perceptual processing was less demanding with upright stimuli. Neuroimaging research has enhanced our understanding of object recognition (Milivojevic, 2012). With categorisation tasks, brain activation is mostly very similar regardless of object orientation. However, orientation influences brain activity early in processing suggesting initial processing is viewpoint-dependent. With identification tasks, there is typically greater activation of areas within the inferior temporal cortex when objects are not in their typical or canonical orientation (Milivojevic, 2012). This finding is unsurprising

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 109

28/02/20 6:43 PM

110

Visual perception and attention

since the inferotemporal cortex is heavily involved in object recognition (Gauthier & Tarr, 2016). Identification may require additional processing (e.g., more detailed processing of object features) for objects presented in unusual orientations. Learning influences the extent to which object recognition is viewpoint-­ dependent or viewpoint-invariant. Zimmermann and Eimer (2013) presented unfamiliar faces on 640 trials. Face recognition was viewpoint-dependent initially but became more viewpoint-invariant there­ after. Learning caused more information about each face to be stored in long-term memory and this facilitated rapid access to visual face memory regardless of facial orientation. Etchells et al. (2017) also studied the effects of learning on face recognition. During learning, observers were repeatedly shown one or two views of unfamiliar faces. Subsequently they were shown a novel view of these faces. There was evidence of viewpoint-invariant face recognition when learning had been based on two different views but not when it had been based on only a single view. Related research was reported by Weibert et al. (2016). They found evidence of a viewpoint-invariant response in face-selective regions of the medial temporal lobe with familiar (but not unfamiliar) faces. Thus, viewpoint-invariant responses during object recognition are more fre­ quent  for faces for which observers have stored considerable relevant information. Evidence of viewpoint-dependent or viewpoint-invariant responses within the brain often depends on the precise brain areas studied. Erez et al. (2016) found viewpoint-dependent responses in several visual areas (e.g., fusiform face area) but viewpoint-invariant responses in the perirhinal cortex. There is more evidence for viewpoint-invariant brain responses late rather than early in visual processing. Why is that? As Erez et al. (p. 2271) argued, “Representations of low-level features are transformed into more complex and invariant representations as information flows through successive stages of [processing].” Most research is limited because object recognition is typically assessed in only one context, which may prompt either viewpoint-­invariant or ­ viewpoint-dependent recognition performance. Tarr and Hayward (2017) argued this approach can misleadingly suggest observers store only viewpoint-invariant or viewpoint-dependent information. Accordingly, ­ they used various contexts. Observers originally learned the identities of novel objects that could be discriminated by viewpoint-invariant information. As predicted, they exhibited viewpoint-invariant object recognition when tested. When the testing context was changed to make it hard to continue to use that approach, observers shifted to exhibiting viewpoint-­ dependent behaviour. The central conclusion from the above findings is that: “Object representations are neither viewpoint-dependent nor viewpoint-invariant, but rather encode multiple kinds of information . . . deployed in a flexible manner appropriate to context and task” (Tarr & Hayward, 2017, p.  108). Thus, visual object representations contain richer and more variegated information than typically assumed on the basis of limited testing conditions.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 110

28/02/20 6:43 PM



Object and face recognition

111

Conclusions As Gauthier and Tarr (2016, p. 179) concluded: “Depending on the experimental conditions and which parts of the brain we look at, one can obtain data supporting both the structural-description (i.e., the viewpoint-­ invariant) and the view-based [viewpoint-dependent] approaches.” There has been progress in identifying factors (e.g., is categorisation or identification required? is the object familiar or unfamiliar?) influencing whether object recognition is viewpoint-invariant or viewpoint-dependent. Gauthier and Tarr (2016, p. 379) argued researchers should address the following question: “What is the nature of the features that comprise high-level visual representations and lead to image-dependence or image-invariance?” Thus, we should focus more on why object recognition is viewpoint-invariant or viewpoint-dependent. As yet, “The exact and fine-grained features of object representations are still unknown and are not easily resolved” (Gauthier & Tarr, 2016, p. 379).

OBJECT RECOGNITION: TOP-DOWN PROCESSES Historically, most theorists (e.g., Marr, 1982; Biederman, 1987) studying object recognition emphasised bottom-up processes. Apparent support can be found in the hierarchical nature of visual processing. As Yardley et al. (2012, p. 4) pointed out, Traditionally, visual object recognition has been taken as mediated by a hierarchical, bottom-up stream that processes an image by s­ ystematically analysing its individual elements and relaying this information to the next areas until the overall form and identity are determined. The above account, assuming a feedforward hierarchy of processing stages from visual cortex through to inferotemporal cortex, is oversimplified. There are as many backward projecting neurons (associated with top-down processing) as forward projecting ones throughout most of the visual system (Gilbert & Li, 2013). Up to 90% of the synapses from incoming neurons to primary visual cortex (involved in early visual processing) originate in the cortex and thus reflect top-down processes. Recurrent processing (a form of top-down processing) from higher to lower brain areas is often necessary for conscious visual perception (van Gaal & Lamme, 2012; see Chapter 16). Top-down processes should have their greatest impact on object recognition when bottom-up processes are relatively uninformative (e.g., when observers are presented with degraded or briefly presented stimuli). Support for this prediction is discussed on p. 112.

Findings Evidence for the involvement of top-down processes in visual perception was reported by Goolkasian and Woodberry (2010). They presented observers with ambiguous figures immediately preceded by primes relevant to one interpretation (see Figure 3.11). The primes systematically biased the interpretation of the ambiguous figures via top-down processes.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 111

28/02/20 6:43 PM

112

Visual perception and attention

Young boy

Peacock feathers Words on a page

Figure 3.11 Ambiguous figures (e.g., Eskimo/Indian, Liar/ Face) were preceded by primes (e.g., Winter Scene, Tomahawk) relevant to one interpretation of the following figure. From Goolkasian and Woodberry (2010). Reprinted with permission from the Psychonomic Society 2010.

Viggiano et al. (2008) obtained strong evidence that top-down processes within the prefrontal cortex influence object recognition. Observers viewed blurred or non-blurred photographs of living and non-living objects. On some trials, repetitive transcranial magnetic stimulation (rTMS; see Glossary) was applied to the dorsolateral prefrontal cortex to disrupt topdown processing. rTMS slowed object recognition time only with blurred photographs. Thus, top-down processes were directly involved in object recognition when the sensory information available to bottom-up processes was limited.

Controversy Firestone and Scholl (2016) argued in a review, “There is . . . no evidence for top-down effects of cognition on visual perception.” They claimed that top-down processes often influence response bias, attention or memory rather than perception itself.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 112

28/02/20 6:43 PM



113

Object and face recognition

First, consider ambiguous or reversible figures (e.g., the faces-goblet illusion shown in Figure 3.5). Observers alternate between the two possible interpretations (e.g., faces vs goblet). The dominant one at any moment depends on their direction of attention but not necessarily on top-down processes (Long & Toppino, 2004). Second, Auckland et al. (2007) presented observers briefly with a target object (e.g., playing cards) surrounded by four context objects. When the context objects were semantically related to the target (e.g., dice; chess pieces; plastic chips; dominoes), the target was recognised more often than when they were semantically unrelated. This finding depended in part on response bias (i.e., guesses based on context) rather than perceptual information about the target. (More evidence of response bias is discussed in the Box). Firestone and Scholl’s (2016) article has provoked much controversy. Lupyan (2016, p. 40) attacked their tendency to attribute apparent topdown effects on perception to attention, memory and so on: “This ‘It’s not perception, it’s just X’ reasoning assumes that attention, memory, and so forth be cleanly split from perception proper.” In fact, all these processes interact dynamically and so attention, perception and memory are not clearly separate.

KEY TERM Shooter bias The tendency for unarmed black individuals to be more likely than unarmed white individuals to be shot.

IN THE REAL WORLD: SHOOTER BIAS Shooter bias is shown by “More shooting errors for unarmed black than white suspects” (Cox & Devine, 2016, p. 237). Black Americans are more than twice as likely as white Americans to be unarmed when killed by the police (Ross, 2015). For example, on 22 November 2014, a police officer in Cleveland, Ohio, shot dead a 12-year-old black male (Tamir Rice) playing with a replica pistol. Shooter bias may reflect top-down influences on visual perception. Payne (2006) presented a white or black face followed by the very brief presentation of a gun or tool. When participants made a rapid response, they indicated falsely they had seen a gun more often when the face was black. Shooter bias reflects top-down effects based on inaccurate racial stereotypes associating black individuals with threat (e.g., Azevedo et al., 2017). This bias might be due to direct top-down effects on perception: objects are more likely to be misperceived as guns if held by black individuals. Alternatively, shooter bias may reflect response bias (the expectation someone has a gun is greater if that person is black rather than white): there is no effect on perception but shooters require less perceptual evidence to shoot a black individual. Azevedo et al. (2017) found a briefly presented weapon (a gun) was more accurately perceived when preceded by a black face than a white one. However, the opposite was the case when a tool was presented. These findings were due to response bias rather than perception. Moore-Berg et al. (2017) asked non-black participants to decide rapidly whether or not to shoot an armed or unarmed white or black person of high or low socio-economic status. There was shooter bias: participants were biased towards shooting if the individual was black, of low socio-economic status, or both. This shooter bias mostly reflected a response bias against shooting a white person of high socio-economic status (probably because of a low level of perceived danger).

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 113

28/02/20 6:43 PM

114

Visual perception and attention

Further findings Howe and Carter (2016) identified two perception-like phenomena driven by top­ down processes. First, there are visual hallucinations which are found in schizophrenic patients. Hallucinations are experienced as actual perceptions even though the relevant object is not present and so they cannot depend on bottom-up processes. Second, there is visual imagery (discussed on pp.  130–137). Visual imagery involves several processes involved in visual perception. Like hallucinations, visual imagery occurs in the absence of bottom-up processes because the relevant object is absent. Lupyan (2017) discussed numerous topdown effects on visual perception in studies avoiding the problems identified by Firestone and Scholl (2016). Look at Figure 3.12, which apparently shows an ordinary brick wall. If that is what you see, have another look. This Figure 3.12 time see whether you can spot the object A brick wall that can be seen as something else. mentioned at the end of the Conclusions From Plait (2016). section. Once you have spotted the object, it becomes impossible not to see it afterwards. Here we have powerful effects of top-down processing on perception based on knowledge of what is in the photograph.

Conclusions As Firestone and Scholl (2016) argued, it is hard to demonstrate top-down processes directly influence perception rather than attention, memory or response bias. However, many studies have shown such a direct influence. As Yardley et al. (2012, p. 1) pointed out, “Perception relies on existing knowledge as much as it does on incoming information.” Note, however, the influence of top-down processes is generally greater when visual stimuli are degraded. By the way, the hard-to-spot object in the photograph is a cigar!

Theories emphasising top-down processes Bar et al. (2006) found greater activation of the orbitofrontal cortex (part of the prefrontal cortex) when object recognition was hard than when it was easy. This activation occurred 50 ms before activation in ­recognition-related regions of the temporal cortex, and so seemed important for object recognition. In Bar et al.’s model, object recognition depends on  top-down processes involving the orbitofrontal cortex and ­bottom-up  processes involving the ventral visual stream (see Figure 3.13; and Chapter 2).

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 114

28/02/20 6:43 PM



Trapp and Bar (2015) developed this model claiming that visual input rapidly elicits various hypotheses concerning what has been presented. Subsequent top-down processes a­ssociated with the orbitofrontal cortex select ­ relevant hypotheses and suppress irrelevant ones. More specifically, the orbitofrontal cortex uses contextual information to generate hypotheses and resolve competition among hypotheses. Palmer (1975) showed the ­ importance of context. He presented a picture of a scene (e.g., a kitchen) followed by the very brief presentation of the picture of an object. The object was recognised more often when relevant to the context (e.g., a loaf) than when irrelevant (e.g., a drum).

Interactive-iterative framework Baruch et al. (2018) argued that previous theorists had not appreciated the full complexities of interactions between bottom-up and topdown processes in object recognition. They rectified this situation with their interactive-­ iterative framework (see Figure 3.14). According to this framework, observers typically form hypotheses concerning object identity based on their goals, knowledge and the environmental context. Of importance, these hypotheses are often formed before the object is presented. Observers discriminate among competing hypotheses by attending to a distinguishing feature of the object. For example, if your tentative hypothesis was elephant, you might allocate attention to the expected location of its trunk. If that failed to provide the necessary information because that area was partially hidden (see Figure  3.15), you might then attend to other features (e.g., size and shape of the leg; skin texture). In sum, Baruch et al. (2018) emphasised two related top-down processes strongly influencing object recognition. First, observers form hypotheses about the possible identity of an object prior to (or in interaction with) the visual input. Second, observers direct their attention to object parts likely to be maximally informative concerning its identity.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 115

115

Object and face recognition

Figure 3.13 In this modified version of Bar et al.’s (2006) theory, it is assumed that object recognition involves two different routes: (1) a top-down route in which information proceeds rapidly to the orbitofrontal cortex, which is involved in generating predictions about the object’s identity; (2) a bottom-up route using the slower ventral visual stream. From Yardley et al. (2012). Reprinted with permission from Springer.

Pre-existing dynamically changing context

Goals, knowledge and context-based expectancies

Hypothesis/es regarding object identity

Object identified?

yes

Response

no

Visual data extraction

Visual input

Guidance of attention to distinguishing features Potential conflict Capture of attention by salient features

Top-down Bottom-up

Figure 3.14 Interactive-iterative framework for object recognition with top-down processes shown in dark green and bottom-up processes in brown. From Baruch et al. (2018). Reprinted with permission of Elsevier.

28/02/20 6:43 PM

116

Visual perception and attention

Findings According to the ­ interactive-iterative framework, expectations can exert topdown influences on processing even before a visual stimulus is presented. Kok et al. (2017) obtained support for this prediction. Observers expecting a given stimulus produced a neural signal resembling that generated by the actual presentation of the stimulus shortly before it was presented. Baruch et al. (2018) tested various predictions from their theoretical framework. In one experiment, participants decided which of two types of artificial fish (tass or grout) Figure 3.15 had been presented. The two fish types difRecognising an elephant when a key feature (its trunk) is fered with respect to distinguishing features partially hidden. associated with the tail and the mouth, with From Baruch et al. (2018). Reprinted with permission of Elsevier. the tail being easier to discriminate. As predicted, participants generally attended more to the tail than the mouth region from stimulus onset. When much of the tail region was hidden from view, participants redirected their attention to the mouth region.

Summary Numerous theorists have argued that object recognition depends on top-down processes as well as bottom-up ones. Baruch et al.’s (2018) ­interactive-iterative framework extends such ideas by identifying how these two types of processes interact. Of central importance, top-down processes influence the allocation of attention, and the allocation of attention influences subsequent bottom-up processing.

FACE RECOGNITION There are two main reasons for devoting a section to face recognition. First, recognising faces is of enormous importance to us, since we generally identify individuals from their faces. Form a visual image of someone important in your life – it probably contains detailed information about their face. Second, face recognition differs importantly from other forms of object recognition. As a result, we need theories specifically devoted to face recognition rather than simply relying on theories of object recognition.

KEY TERM Holistic processing Processing that involves integrating information from an entire object (especially faces).

Face vs object recognition How does face recognition differ from object recognition? There is more holistic processing in face recognition. Holistic processing involves “integration across the area of the face, or processing of the relationships between features as well as, or instead of, the features themselves” (Watson & Robbins, 2014, p. 1). Holistic processing is faster because facial features

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 116

28/02/20 6:43 PM



are processed in parallel rather than individually. Caharel et al. (2014) found faces can be categorised as familiar or unfamiliar within approximately 200 ms. Holistic processing is also more reliable than feature processing because individual facial features (e.g., mouth shape) are subject to change. Relevant evidence comes from the face inversion effect: faces are much harder to identify when presented inverted or upside-down rather than upright (Bruyer, 2011). This effect probably reflects difficulties in processing inverted faces holistically. There are surprisingly large effects of face inversion within the brain – Rosenthal et al. (2017, p. 4823) found face inversion “induces a dramatic functional reorganisation across related brain networks”. In contrast, adverse effects of inversion are often much smaller with non-face objects. For example, Klargaard et al. (2018) found there was a larger inversion effect for faces than for cars. However, it can be argued we possess expertise in face recognition and so we should consider individuals possessing expertise with non-face objects. The findings are mixed. Rossion and Curran (2010) found car experts had a much smaller inversion effect for cars than faces. However, those with the greatest expertise showed a greater inversion effect for cars. In contrast, Weiss et al. (2016) found horse experts had no inversion effect for horses. More evidence suggesting faces are special comes from the ­part-whole effect – it is easier to recognise a face part when presented within a whole face rather than in isolation. Farah (1994) studied this effect using drawings of faces and houses. Participants’ ability to recognise face ­ parts  was much better when whole faces were presented rather than only a single feature  (i.e., the part-whole effect). In contrast, recognition performance for house features was very similar in whole and single-feature conditions. Richler et al. (2011) explored the hypothesis that faces are processed holistically by using composite faces. Composite faces consist of a top half and a bottom half that may or may not be from the same face. The task was to decide whether the top halves of two successive composite faces were the same or different. Performance was worse when the bottom halves of the two composite faces were different. This composite face effect suggests people find it hard to ignore the bottom halves and thus that face processing is holistic. Finally, accurate face recognition is so important to humans we might expect to find holistic processing of faces even in young children. As predicted, children aged between 3 and 5 show holistic processing (McKone et al., 2012). In sum, face recognition (even in young children) involves holistic processing. However, it remains unclear whether the processing differences between faces and other objects occur because faces are special or because we have dramatically more expertise with faces than most other object categories. Relevant evidence was reported by Ross et al. (2018). When participants were presented with car pictures, car experts formed more holistic representations within the brain than did car novices. The role played by expertise is discussed further shortly.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 117

117

Object and face recognition

KEY TERM Face inversion effect The finding that faces are much harder to recognise when presented upside down; the effect of inversion is less marked (or absent) with other objects. Part-whole effect The finding that a face part is recognised more easily when presented in the context of a whole face rather than on its own.

28/02/20 6:43 PM

118

KEY TERM Prosopagnosia A condition (also known as face blindness) in which there is a severe impairment in face recognition but much less impairment of object recognition; it is often the result of brain damage (acquired prosopagnosia) but can also be due to impaired development of facerecognition mechanisms (developmental prosopagnosia).

Visual perception and attention

Prosopagnosia Much research has involved brain-damaged patients with severely impaired face processing. Such patients suffer from prosopagnosia (pros-uh-pagNO-see-uh) coming from the Greek words for “face” and “without knowledge”. Prosopagnosia is also known as “face blindness”. Prosopagnosia is a heterogeneous or diverse condition with the precise problems of face and object recognition varying across patients. It can be caused by brain damage (acquired prosopagnosia) or can occur in the absence of any obvious brain damage (developmental prosopagnosia). Acquired prosopagnosics differ in terms of their specific face-processing deficits and brain areas involved (discussed later). Studying prosopagnosics is of direct relevance to the issue of whether face recognition involves specific or specialised processes absent from object recognition. If prosopagnosics invariably have great impairments in object recognition, it would suggest face and object recognition involve similar processes. In contrast, if some prosopagnosics have intact object recognition, it would imply the processes underlying the two forms of recognition are different. Farah (1991) reviewed research on patients with acquired prosopagnosia. All these patients also had more general problems with object recognition. However, some exceptions have been reported. Moscovitch

IN REAL LIFE: HEATHER SELLERS We can understand the profound problems prosopagnosics suffer in everyday life by considering Heather Sellers (see YouTube: “You ­ Don’t Look  Like Anyone I Know”). She is an American woman with severe prosopagnosia. When she was a child, she became separated from her mother at a grocery store. When reunited with her mother, she did not initially recognise her. Heather Sellers still has difficulty in recognising her own face. Heather: “A few times I have been in a crowded elevator with mirrors all found and a woman will move, and I will go to get out the way and then realise ‘oh that woman is me’.” Such experiences made her very anxious. Surprisingly, Heather Sellers was 36 before she realised she had prosopagnosia. Why was this? Heather Sellers. As a child, she became very skilled at identifying Patricia Roehling. people by their hair style, body type, clothing, voice and gait. In spite of these skills, she has occasionally failed to recognise her own husband! According to Heather Sellers, “Not being able to reliably know who people are – it feels terrible like failing all the time.”

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 118

28/02/20 6:43 PM



Object and face recognition

119

et  al. (1997) studied CK, a man with object agnosia (impaired object recognition). He performed comparably to healthy controls on several face-­ recognition tasks including photos, caricatures and cartoons. Geskin and Behrmann (2018) reviewed the literature on patients with developmental prosopagnosia. Out of 238 cases, 80% had impaired object recognition but 20% did not. Thus, several patients had impaired face recognition but not object recognition. We would have a double dissociation (see Glossary) if we could find individuals with developmental object agnosia but intact face recognition. Germine et al. (2011) found a female (AW), who had preserved face recognition but impaired object recognition for many categories of objects. Overall, far more individuals have impaired face recognition (prosopagnosia) but relatively intact object recognition than have impaired object recognition but intact face recognition. These findings suggest that, “Face recognition is an especially difficult instance of object recognition where both systems [i.e., face and object recognition] rely on a common mechanism” (Geskin & Behrmann, 2018, p. 18). Face recognition is hard in part because it involves distinguishing among broadly similar category members (e.g., two eyes; nose; mouth). In contrast, object recognition often only involves identifying the relevant category (e.g., cat; car). According to this viewpoint, prosopagnosics would perform poorly if required to make finegrained perceptual judgments with objects. An alternative interpretation emphasises expertise (Wang et al., 2016). Nearly everyone has more experience (and expertise) at recognising faces than the great majority of other objects. It is thus possible that brain damage in prosopagnosics affects areas associated with expertise generally rather than specifically faces (the expertise hypothesis is discussed on pp. 122–124).

Findings In spite of their poor conscious or explicit recognition of faces, many prosopagnosics show evidence of covert recognition (face processing without conscious awareness). For example, Eimer et al. (2012) found developmental prosopagnosics were much worse than healthy controls at explicit recognition of famous faces (27% vs 82% correct, respectively). However, famous faces produced brain activity in half the developmental prosopagnosics indicating the relevant memory traces were activated (covert recognition). These prosopagnosics have very poor explicit recognition performance because brain areas containing more detailed information about the famous individuals were not activated. Busigny et al. (2010b) compared the first two interpretations discussed above by using object-recognition tasks requiring complex within-category distinctions for several categories: birds, boats, cars, chairs and faces. A male patient (GG) with acquired prosopagnosia was as accurate as controls with each non-face category (see Figure 3.16). However, he was substantially less accurate than controls with faces (67% vs 94%, respectively). Thus, GG apparently has a face-specific impairment rather than a general inability to recognise complex stimuli.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 119

28/02/20 6:43 PM

120

Visual perception and attention

Figure 3.16 Accuracy and speed of object recognition for birds, boats, cars, chairs and faces by patient GG and healthy controls. From Busigny et al. (2012b). Reprinted with permission from Elsevier.

Busigny et al. (2010a) reviewed previous findings suggesting many patients with acquired prosopagnosia have essentially intact object recognition. However, this research was limited because the difficulty of the recognition decisions required of the patients was not controlled systematically. Busigny et al. manipulated the similarity between target items and distractors on an object-recognition task. Increasing similarity had comparable effects on PS (a patient with acquired prosopagnosia) and healthy controls. In contrast, PS performed very poorly on a face-recognition task which was very easy for healthy controls. Why is face recognition so poor in prosopagnosics? Busigny et al. (2010b) tested the hypothesis that they have great difficulty with holistic processing. A prosopagnosic patient (GG) did not show the face inversion or composite face effects suggesting he does not perceive individual faces holistically (an ability enhancing accurate face recognition). In contrast, GG’s object recognition was intact perhaps because holistic processing was not required. Van Belle et al. (2011) also investigated the deficient holistic processing hypothesis. GG’s face-recognition performance was poor when holistic processing was possible. However, it was intact when it was not possible to use holistic processing (only one part of a face was visible at a time). Finally, we consider the expertise hypothesis. According to this ­hypothesis, faces differ from most other categories of objects in that we have more expertise in identifying faces. As a result, apparent differences between faces and other objects in processes and brain mechanisms may mostly reflect differences in expertise. This hypothesis is discussed further below on pp. 122–124. Barton and Corrow (2016) reported evidence ­consistent with this hypothesis in patients with acquired prosopagnosia who had expertise in car recognition and reading prior to their brain damage. These patients had impairments in car recognition and aspects of visual word reading suggesting they had problems with objects for which they had ­possessed expertise (i.e., objects of expertise). Contrary evidence was reported by Weiss et al. (2016), who studied a patient (OH) with ­ developmental prosopagnosia. In spite of severely impaired face-recognition ability, she displayed superior recognition skills for ­ horses (she had spent 15 years working with them). Thus, visual expertise can be acquired independently of the mechanisms responsible for expertise in face recognition.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 120

28/02/20 6:43 PM



121

Object and face recognition

In sum, the finding that many prosopagnosics have face-specific impairments is consistent with the hypothesis that face recognition involves special processes. However, more general recognition impairments have also often been reported and are apparently inconsistent with that hypothesis. There is also some support for the expertise hypothesis but again the findings are mixed.

Fusiform face area If faces are processed differently to other objects, we would expect to find brain regions specialised for face processing. The fusiform face area (FFA) in the ventral temporal cortex has (as its name strongly implies!) been identified as such a brain region. The fusiform face area is indisputably involved in face processing. Downing et al. (2006) found the fusiform face area responded more strongly to faces than any of 18 object categories (e.g., tools; fruits; vegetables). However, other brain regions, including the occipital face area (OFA) and the superior temporal sulcus (STS) are also face-selective (Grill-Spector et al., 2017; see Figure 3.17). Such findings indicate that face processing depends on one or more brain networks rather than simply on the fusiform face area. Even though several brain areas are face-selective, the fusiform face area has been regarded as having special importance. For example, Axelrod and Yovel (2015) considered brain activity in several face-­ selective  regions  when observers were shown photos of Leonardo DiCaprio and Brad Pitt. The fusiform face area was the only region in which the pattern of brain activity differed significantly between these actors. However, Kanwisher et al. (1997) found only 80% of their participants had greater activation within the fusiform face area to faces than to other objects. In sum, the fusiform face area plays a major role in face processing and recognition for most (but probably not all) individuals. However, face (a) Dorsal

KEY TERM Fusiform face area An area that is associated with face processing; the term is somewhat misleading given that the area is also associated with processing other categories of objects.

Interactive feature: Primal Pictures’ 3D atlas of the brain

(b) Ventral

OFA

FFA

ATL-FA

IFG-FA

pSTS-FA OFA aSTS-FA

Figure 3.17 Face-selective areas in the right hemisphere. OFA = occipital face area; FFA = fusiform face area; pSTS-FA and aSTS-FA = posterior and anterior superior temporal sulcus face areas; IFG-FA = inferior frontal gyrus face area; ATL-FA = anterior temporal lobe face area. From Duchaine and Yovel (2015).

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 121

28/02/20 6:43 PM

122

Visual perception and attention

processing depends on a brain network, including several areas in addition to the fusiform face area (see Figure 3.17). Note also that the fusiform face area is activated when we process numerous types of non-face objects. Finally, face-processing deficits in prosopagnosics are not limited to the fusiform face area. For example, developmental prosopagnosics had less selectivity for faces than healthy controls in 12 different face areas (including the fusiform face area) (Jiahui et al., 2018).

Expertise hypothesis According to advocates of the expertise hypothesis (e.g., Wang et al., 2016; discussed on p. 119), major differences between face and object processing should not be taken at face value (sorry!). According to this hypothesis, the brain and processing mechanisms allegedly specific to faces are also involved in processing and recognising all object categories for which we possess expertise. Thus, we should perhaps relabel the fusiform face area as the “fusiform expertise area”. Why is expertise so important in determining face and object processing? One reason is that expertise leads to greater holistic or integrated processing. For example, chess experts can very rapidly use holistic processing based on their relevant stored knowledge to understand complex chess positions (see Chapter 12). Three main predictions follow from the expertise hypothesis: (1) Holistic or configural processing is not unique to faces but should be found for any objects of expertise. (2) The fusiform face area should be highly activated when observers ­recognise the members of any category for which they possess expertise. (3) If the processing of faces and of objects of expertise involves similar processes, then processing objects of expertise should interfere with face processing.

Findings The first prediction is plausible. Wallis (2013) tested a model of object recognition to assess the effects of prolonged exposure to any given stimulus category. The model predicted that many phenomena associated with face processing (e.g., holistic processing; inversion effect) would be found for any stimulus category for which observers had expertise. Repeated simultaneous presentation of the same features (e.g., nose; mouth; eyes) gradually increases holistic processing. Wallis concluded a single model can explain object and face recognition. There is some support for the first prediction in research on detection of abnormalities in medical images (see Chapter 12). Kundel et al. (2007) found experts generally fixated on an abnormality in under 1 second suggesting they used very fast, holistic processes. However, as we saw earlier, experts with non-face objects often have a small inversion effect (assumed to reflect holistic processing). McKone et al. (2007) found such experts

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 122

28/02/20 6:43 PM



Object and face recognition

123

rarely show the composite effect (also assumed to reflect holistic processing; discussed on p. 117). We turn now to the second prediction. In a review, McKone et  al. (2007) found a modest tendency for the fusiform face area to be more activated by objects of expertise than other objects. However, larger activation effects for objects of expertise were found outside the ­fusiform face area  than inside it. Support for the second prediction was reported by McGugin et al. (2014): activation to car stimuli within the  ­fusiform face area was greater in participants having greater car expertise. McGugin et al. (2018) argued that we can test the second prediction by comparing individuals varying in face-recognition ability (or expertise). As predicted, those with high face-recognition ability exhibited more face-selective activation within the fusiform face area than those having low ability. Bilalić (2016) found chess experts had more activation in the fusiform face area than non-experts when viewing chess positions but not single chess pieces. He concluded, “The more complex the stimuli, the more likely it is that the brain will require the help of the FFA in grasping its essence” (p. 1356). It is important not to oversimplify the issues here. Even if face processing and processing of other objects of expertise both involve the fusiform face area, the two forms of processing may use different neurons in different combinations (Grill-Spector et al., 2017). We turn now to the third prediction. McKeeff et al. (2010) found that car experts were slower than novices when searching for face targets among cars but not among watches. Car and face expertise may have interfered with each other because they depend on similar processes. Alternatively, car experts may have been more likely than car novices to attend to distracting cars because they find such stimuli more interesting. McGugin et al. (2015) also tested the third prediction. Overall, car experts had greater activation than car novices in face-selective areas (e.g., fusiform face area) when processing cars. Of key importance, that difference was greatly reduced when faces were also presented. Thus, interference was created when participants processed objects belonging to two different categories of expertise (i.e., cars and faces).

Evaluation There is some support for the expertise hypothesis with respect to all three predictions. However, the extent of that support remains controversial. One reason is that it is hard to assess expertise level accurately or to control it. It is certainly possible that many (but not all) processing differences between faces and other objects are due to greater expertise with faces. This would imply that faces are less special than often assumed. According to the expertise hypothesis, we are face experts. This may be true of familiar faces, but it is certainly not true of unfamiliar faces (Young & Burton, 2018). Evidence of the problems we experience in recognising unfamiliar faces is contained in the Box on passport control.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 123

28/02/20 6:43 PM

124

Visual perception and attention

IN REAL LIFE: PASSPORT CONTROL Look at the 40 faces displayed below (see Figure 3.18). How many different individuals are shown? Provide your answer before reading on.

Figure 3.18 An array of 40 face photographs to be sorted into piles for each of the individuals shown in the photographs. From Jenkins et al. (2011). Reproduced with permission from the Royal Society.

In a study by Jenkins et al. (2011) using a similar stimulus array, participants on average decided 7.5 different individuals were shown. However, the actual number for the array used by Jenkins et al. and the one shown in Figure 3.18 is only two! The two individuals (A and B) are arranged as shown below: A B A A A B A B A B A A A A A B B B A B B B B A A A B B A A B A B A A B B B B B Perhaps we are poor at matching unfamiliar faces because we rarely perform this task in everyday life. White et al. (2014) addressed this issue in a study on passport officers averaging 8 years of service. These passport officers indicated on each trial whether a photograph was that of a physically present person. Overall, 6% of valid photos were rejected and 14% of fraudulent photos were wrongly accepted. Thus, individuals with specialist training and experience are not exempt from problems in matching unfamiliar faces. The main problem is that there is considerable variability in how an individual looks in different photos (discussed further on p. 127).

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 124

28/02/20 6:43 PM



125

Object and face recognition

In another experiment, White et al. (2014) compared the performance of passport officers and students on a matching task with unfamiliar faces. The two groups were comparable with 71% correct performance on match trials and 89% on non-match trials. Thus, training and experience were irrelevant. In White et al.’s (2014) research, 50% of the photos were invalid (non-matching). This is (hopefully!) a massively higher percentage of invalid photos than typically found at passport control. Papesh and Goldinger (2014) compared performance when actual mismatches occurred on 50% or 10% of trials. In the 50% condition, mismatches were missed on 24% of trials, whereas they were missed on 49% of trials in the 10% condition. Participants had a low expectation of mismatches in the 10% condition and so were very cautious about deciding two photos were of different individuals (i.e., they had a very cautious response criterion indicating response bias). Papesh et al. (2018) replicated the above findings. They attempted to improve performance in the condition where mismatches occurred on only 10% of trials by introducing blocks where mismatches occurred on 90% of trials. However, this manipulation had little effect because participants were reluctant to abandon their very cautious response criterion. How can we provide better security at passport control? Increased practice at matching unfamiliar faces is not the answer – White et al. (2014) found performance was unrelated to the number of years passport officers had served. A promising approach is to find individuals having an exceptional ability to recognise faces (super-recognisers). Robertson et al. (2016) asked participants to decide whether face pairs depicted the same person. Mean accuracy was 96% for previously identified police super-recognisers compared to only 81% for police trainees. Why do some individuals have very superior face-recognition ability? Wilmer et al. (2010) found the face-recognition performance of monozygotic (identical) twins was much closer than that of dizygotic (fraternal) twins, indicating face-recognition ability is strongly influenced by genetic factors. Face-recognition ability correlated very modestly with other forms of ­ recognition  (e.g., abstract art images), suggesting it is very specific. In similar fashion, Turano et  al. (2016) found good and poor face recognisers did not differ with respect to car-recognition ability.

Theoretical approaches Bruce and Young’s (1986) model has been the most influential theoretical approach to face processing and recognition and so we start with it. It is a serial stage model consisting of eight components (see Figure 3.19): (1) Structural encoding: this produces various descriptions or representations of faces. (2) Expression analysis: people’s emotional states are inferred from their facial expression. (3) Facial speech analysis: speech perception is assisted by lip reading (see Chapter 9). (4) Direct visual processing: specific facial information is processed selectively. (5) Face recognition units: these contain structural information about known faces; this structural information emphasises the less change­ able aspects of the face and is fairly abstract. (6) Person identity nodes: these provide information about individuals (e.g., occupation; interests). (7) Name generation: a person’s name is stored separately.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 125

Interactive exercise: Face recognition

KEY TERM Super-recognisers Individuals with an outstanding ability to recognise faces.

28/02/20 6:43 PM

126

Visual perception and attention

(8)  Cognitive system: this contains additional information (e.g., most actors have attractive faces); it influences which components receive attention. What predictions follow? First, there should be major differences in the processing of familiar and unfamiliar faces because various components (face recognition units; person identity nodes; name generation) are involved only when processing familiar faces. Thus, it is much easier to recognise familiar faces, especially when faces are seen from an unusual angle. Second, separate processing routes are involved in working out facial identity (who is it?) and facial expression (what is he/she feeling?). The former processing route (including the occipital face area and the fusiform face area) focuses on relatively unchanging aspects of faces, whereas the latter (involving the superior temporal sulcus) deals with more changeable aspects. This separation between processes responsible for recognising identity and expression makes sense – if there were no separation, we would have great problems recognising familiar faces with unusual expressions (Young, 2018). Third, when we see a familiar face, familFigure 3.19 iarity information from the face recognition The model of face recognition put forward by Bruce and unit should be accessed first. This is followed Young (1986). by information about that person (e.g., occuAdapted from Bruce and Young (1986). Reprinted with permission of pation) from the person identity node and Elsevier. then that person’s name from the name generation component. As a result, we can find a face familiar while unable to recall anything else about that person, or we can recall personal information about a person while being unable to recall their name. However, a face should never lead to recall of the person’s name in the absence of other information. Fourth, the model assumes face processing involves several stages. This implies the nature of face-processing impairments in brain-damaged patients depends on which stages of processing are impaired. DaviesThompson et al. (2014) developed the model to account for three forms of face impairment (see Figure 3.20).

Findings According to the model, it is easier to recognise familiar faces than unfamiliar ones for various reasons. Of special importance, we possess much more structural information about familiar faces. This structural information

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 126

28/02/20 6:43 PM



Object and face recognition

127

Early perceptual encoding

Perceptual encoding of dynamic structure

Perceptual encoding of static structure

Expression analysis

Facial memories

Voice & gait analysis

Biographic information

Semantic data

Name input

Pole aLT FFA OFA

Damage results in: Apperceptive prosopagnosia Associative prosopagnosia Person-specific amnesia

Figure 3.20 Damage to regions of the inferior occipito-temporal cortex (including fusiform face area (FFA) and occipital face area (OFA)) is associated with apperceptive prosopagnosia (blue); damage to anterior inferior temporal cortex (aLT) is associated with associative prosopagnosia (red); and damage to the anterior temporal pole is associated with person-specific amnesia (green). Davies-Thompson et al. (2014) discuss evidence consistent with their model. From Davies-Thompson et al. (2014).

(associated with face recognition units) relates to relatively unchanging aspects of faces and gradually accumulates with increasing familiarity with any given face. However, the differences in ease of recognition between familiar and unfamiliar faces are greater than envisaged by Bruce and Young (1986). Jenkins et al. (2011) found 40 face photographs showing only two different unfamiliar individuals were thought to show almost four times that number (discussed on p. 124). The two individuals were actually two Dutch celebrities almost unknown in Britain. When Jenkins et al. (2011) repeated their experiment with Dutch participants, nearly all performed the task perfectly because the faces were so familiar. Why is unfamiliar face recognition so difficult? There is ­considerable within-person variability in facial images, which is why different photographs of the same unfamiliar individual often look as if they come from ­different individuals (Young & Burton, 2017, 2018). Jenkins and Burton (2011) argued we could improve identification of unfamiliar faces by averaging  across several photographs of the same individual and so greatly ­reducing image variability. Their findings supported this prediction. Burton et al. (2016) shed additional light on the complexities of recognising unfamiliar faces. In essence, how one person’s face varies across images differs from how someone else’s face varies. Thus, the characteristics that vary or remain constant across images differ from one individual to another.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 127

28/02/20 6:43 PM

128

Visual perception and attention

The second prediction is that different routes are involved in the processing of facial identity and facial expression. There is some support for this prediction. Fox et al. (2011) found patients with damage to the face-recognition network had impaired identity perception but not expression perception. In contrast, a patient with damage to the superior temporal sulcus had impaired expression perception but reasonably intact identity perception. Sliwinska and Pitcher (2018) confirmed the role played by the superior temporal sulcus. Transcranial magnetic stimulation (TMS; see Glossary) applied to this area impaired recognition of facial expression. However, the two routes are not entirely independent. Judgements of facial expression are strongly influenced by irrelevant identity information (Schweinberger & Soukup, 1998). Redfern and Benton (2017) asked participants to sort cards of faces into piles, one for each perceived identity. One pack contained expressive faces and the other neutral faces. With expressive faces, faces belonging to different individuals were more likely to be placed in the same pile. Thus, expressive facial information can influence (and impair) identity perception. Fitousi and Wenger (2013) asked participants to respond positively to a face that had a given identity and emotion (e.g., a happy face belonging to Kiera Knightley). Facial identity and facial expression were not processed independently although they should have been according to the model. Another issue is that the facial expression route is more complex than assumed theoretically. For example, damage to the amygdala produces greater deficits in recognising fear and anger than other emotions (Calder & Young, 2005). Young and Bruce (2011) admitted they had not expected deficits in emotion recognition in faces to vary across emotions. The third prediction is that we always retrieve personal information (e.g., occupation) about a person before recalling their name. Young et al. (1985) asked people to record problems they experienced in face recognition. There were 1,008 such incidents but people never reported putting a name to a face while knowing nothing else about that person. In contrast, there were 190 occasions on which someone remembered a reasonable amount of information about a person but not their name (also as predicted by the model). Several other findings support the third prediction (Hanley, 2011). However, the notion that names are always recalled after personal information is too rigid. Calderwood and Burton (2006) asked fans of the television series Friends to recall the name or occupation of the main characters when shown their faces. Names were recalled faster than occupations (against the model’s prediction). Fourth, we relate face-processing impairments to Bruce and Young’s (1986) serial stage model. We consider three such impairments (discussed by Davies-Thompson et al., 2014; see Figure 3.20) with reference to Figure 3.19: (1) Patients with impaired early stages of face processing: such patients (categorised as having apperceptive prosopagnosia) have “an inability to form a sufficiently accurate representation of the face’s structure from visual data” (Davies-Thompson et al., 2014, p. 161). As a result, faces are often not recognised as familiar.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 128

28/02/20 6:43 PM



Object and face recognition

129

(2) Patients with impaired ability to access facial memories in face recognition units although early processing of facial structure is relatively intact: such patients have associative prosopagnosia: they have greater problems with memory than perception. (3) Patients with impaired access to biographical information stored in person identity nodes: such patients have person-specific amnesia and differ from those with associative prosopagnosia because they often cannot recognise other people by any cues (including spoken names or voices). So far we have applied Bruce and Young’s (1986) model to acquired prosopagnosia. However, we can also apply the model to developmental prosopagnosia (in which face-recognition mechanisms fail to develop normally). Parketny et al. (2015) presented previously unfamiliar faces to developmental prosopagnosics and recorded event-related potentials (ERPs) while they performed an easy face-recognition task. They focused on three ERP components: (1) N170: this early component (about 170 ms) reflects processes involved in perceptual structural face processing. (2) N250: this component (about 250 ms) reflects a match between a presented face and a stored face representation. (3) P600: this component (about 600 ms) reflects attentional processes associated with face recognition. What did Parketny et al. (2015) find? Recognition times were 150 ms slower in the developmental prosopagnosics than healthy controls. N170 was broadly similar in both groups with respect to timing and magnitude. N250 was 40 ms slower in the prosopagnosics than controls but of comparable magnitude. Finally, P600 was significantly smaller in the prosopagnosics than controls and was delayed by 80 ms. In sum, developmental prosopagnosics show relatively intact early face processing but are slower and less efficient later in processing. ERPs provide an effective way of identifying those aspects of face processing adversely affected in prosopagnosia.

Evaluation Bruce and Young (1986) provided a comprehensive framework emphasising the wide range of information that can be extracted from faces. It was remarkably innovative in identifying the major processes and structures involved in face processing and recognition and incorporating them within a plausible serial stage approach. Finally, the model enhanced our understanding of why familiar faces are much easier to recognise than unfamiliar ones. What are the model’s limitations? First, the complexities involved in recognising unfamiliar faces (e.g., coping with the great variability in a given individual’s facial images) were not fully acknowledged. As Young and Burton (2017, p. 213) pointed out, it was several years after 1986 before researchers appreciated that “humans’ relatively poor performance at ­unfamiliar-face recognition is as much a problem of perception as of memory”.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 129

Case study: Model of face processing

28/02/20 6:43 PM

130

KEY TERMS Aphantasia The inability to form mental images of objects when those objects are not present. Hallucinations Perceptual experiences that appear real even though the individuals or objects perceived are not present.

Visual perception and attention

Second, the model’s account of the processing of facial expression is oversimplified. For example, the processing of facial expression is less independent of the processing of facial identity than assumed theoretically. According to the model, damage to the expression analysis component should produce impaired ability to recognise all facial expressions. In fact, many brain-damaged patients have much greater impairment in facial recognition of some emotions than others (Young, 2018). Third, the model was somewhat vague about the precise information stored in the face recognition units and the person identity nodes. Fourth, it was wrong to exclude gaze perception from the model because it provides useful information about what an observer is attending to (Young & Bruce, 2011). Fifth, Bruce and Young (1986) focused on general factors influencing face recognition. However, as discussed earlier, there are substantial individual differences in face-recognition ability with a few individuals (super-recognisers) having outstanding ability. These individual differences depend mostly on genetic factors (Wilmer, 2017) not considered within the model.

VISUAL IMAGERY Close your eyes and imagine the face of someone you know very well. What did you experience? Many people claim forming visual images is like “seeing with the mind’s eye”, suggesting there are important similarities between imagery and perception. Mental imagery is typically regarded as involving conscious experience. However, we could also regard imagery as a form of mental representation (an internal cognitive symbol representing aspects of external reality) (e.g., Pylyshyn, 2002). We would not necessarily be consciously aware of images as mental representations. Galton (1883) supported the above viewpoint. He found many individuals reported no conscious imagery when imagining a definite object (e.g., their breakfast table). Zeman et al. (2015) studied several individuals lacking visual imagery and coined the term aphantasia to refer to this condition. If visual imagery and perception are similar, why do we very rarely confuse them? One reason is that we are generally aware of deliberately constructing images (unlike with visual perception). Another reason is that images contain much less detail. For example, people rate their visual images of faces as similar to photographs lacking sharp edges and borders (Harvey, 1986). However, many people sometimes confuse visual imagery and ­perception. Consider hallucinations in which perception-like experiences occur in the absence of the appropriate environmental stimulus. Visual hallucinations occur in approximately 27% of schizophrenic patients but also in 7% of the general population. Waters et al. (2014) discussed research showing visual hallucinations in schizophrenics are often associated with activity in the primary visual cortex, suggesting hallucinations involve many processes associated with visual perception. One reason ­schizophrenics are susceptible to visual hallucinations is because of distortions in top-down processing (e.g., forming strong expectations of what they will see).

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 130

28/02/20 6:43 PM



131

Object and face recognition

In Anton’s syndrome (“blindness denial”), blind people are unaware that they are blind and sometimes confuse imagery for actual perception. Goldenberg et al. (1995) described a patient whose primary visual cortex had been nearly wholly destroyed. Nevertheless, she generated such vivid visual images that she mistook them for genuine visual perception. The brain damage in patients with Anton’s syndrome typically includes large parts of the visual cortex (Gandhi et al., 2016). There is also Charles Bonnet syndrome, defined as “consistent or periodic complex visual hallucinations that occur in visually impaired ­ individuals” (Yacoub & Ferrucci, 2011, p. 421). However, patients are ­ generally aware the hallucinations are not real and so they are actually pseudo-hallucinations. When patients hallucinate, they have increased activity in brain areas specialised for visual processing (e.g., hallucinations in colour are associated with activity in colour-processing areas) (ffytche et al., 1998). Painter et al. (2018) identified a major reason for this elevated activity. Stimuli presented to intact regions of the retina cause extreme excitability (hyperexcitability) within early visual cortex. Visually impaired individuals with hallucinations show greater hyperexcitability than those without.

KEY TERMS Anton’s syndrome A condition found in some blind people in which they misinterpret their visual imagery as visual perception. Charles Bonnet syndrome A condition in which individuals with eye disease form vivid and detailed visual hallucinations sometimes mistaken for visual perception. Depictive representation A representation (e.g., visual image) resembling a picture in that objects within it are organised spatially.

Why is visual imagery useful? What functions are served by visual imagery? According to Moulton and Kosslyn (2009, p. 1274), visual imagery “allows us to answer ‘what if’ questions by making explicit and accessible the likely consequences of being in a specific situation or performing a specific action”. For example, professional golfers use mental imagery to predict what would happen if they hit a certain shot. Pearson and Kosslyn (2015) pointed out that many visual images contain rich information that is accessible when required. For example, what is the shape of a cat’s ears? You may be able to answer the question by constructing a visual image. More generally, visual imagery supports numerous cognitive functions. These include creative insight, attentional search, guiding deliberate action, short-term memory storage and long-term memory retrieval (Mitchell & Cusack, 2016).

Research activity: Mental imagery

Imagery theories Kosslyn (e.g., 1994; Pearson & Kosslyn, 2015) proposed an influential theory based on the assumption that visual imagery resembles visual perception. It was originally called perceptual anticipation theory because image generation involves processes used to anticipate perceiving visual stimuli. According to the theory, visual images are depictive representations. What is a depictive representation? In such a depiction, “each part of the representation corresponds to a part of the represented object such that the distances among the parts in the representation correspond to the actual distances among the parts” (Pearson & Kosslyn, 2015, p. 10089). Thus, for example, a visual image of a desk with a computer on top and a cat

Interactive exercise: Kosslyn – mental imagery

132

Visual perception and attention

sleeping underneath would have the computer at the top and the cat at the bottom. Where are depictive representations formed? Kosslyn argued they are created in early visual cortex (BA17 and BA18; see Figure 3.21) within a visual buffer. The visual buffer is a short-term store for visual information only and is of major importance in visual perception and imagery. There is also an “attention window” selecting some visual information in the visual buffer and passing it on to other brain areas for further processing. This attention window is flexible – it can be adjusted to include more, or less, visual Figure 3.21 information. The approximate locations of the visual buffer in BA17 Processing in the visual buffer depends and BA18, of long-term memories of shapes in the inferior primarily on external stimulation during pertemporal lobe, and of spatial representations in the posterior parietal cortex, according to Kosslyn and Thompson’s (2003) ception. However, such processing involves anticipation theory. non-pictorial information stored in long-term memory during imagery. Shape information is stored in the inferior temporal lobe whereas spatial representations are stored in posterior parietal cortex (see Figure 3.21). In sum, visual perception mostly involves bottom-up processing whereas visual imagery depends on top-down processing. Pylyshyn (e.g., 2002) argued visual imagery differs substantially from visual perception. According to his propositional theory, performance on mental imagery tasks does not involve depictive or pictorial representations. Instead, it involves tacit knowledge (knowledge inaccessible to conscious awareness). Tacit knowledge is “Knowledge of what things would look like to subjects in situations like the ones in which they are to imagine themselves” (Pylyshyn, 2002, p. 161). Thus, performance on an imagery task relies on relevant stored knowledge rather than visual images. Within this theoretical framework, it is improbable that early visual cortex would be involved on an imagery task.

Imagery resembles perception KEY TERMS Visual buffer Within Kosslyn’s theory, a short-term visual memory store involved in visual imagery and perception. Binocular rivalry When two different visual stimuli are presented one to each eye, only one stimulus is seen; the seen stimulus alternates over time.

If visual perception and imagery involve similar processes, they should influence each other. There should be facilitation if the contents of perception and imagery are the same but interference if they differ. Pearson et al. (2008) reported a facilitation effect with binocular rivalry – when a different stimulus is presented to each eye, only one is consciously perceived at any given moment. The act of imagining a specific pattern strongly influenced which stimulus was subsequently perceived and this facilitation depended on the similarity between the imagined and presented stimuli. The findings were remarkably similar when the initial stimulus was perceived rather than imagined. Baddeley and Andrade (2000) reported an interference effect. Participants rated the vividness of visual and auditory images while performing a second task involving visual/spatial processes. This task reduced

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 132

28/02/20 6:43 PM



133

Object and face recognition

TOP LEFT

TOP RIGHT

Dwell time Perception = 4% Imagery = 8%

Dwell time Perception = 77% Imagery = 64%

Figure 3.22 Dwell time for the four quadrants of a picture during perception and imagery. From Laeng et al. (2014). Reprinted with permission of Elsevier.

White space Dwell time Perception = 2% Imagery = 5%

BOTTOM LEFT

BOTTOM RIGHT

Dwell time Perception = 1% Imagery = 4%

Dwell time Perception = 10% Imagery = 12%

the vividness of visual imagery more than that of auditory imagery because similar processes were involved on the visual/spatial and visual imagery tasks. Laeng et al. (2014) asked participants to view pictures of animals and to follow each one by forming a visual image of that animal. There was a striking similarity in eye fixations devoted to the various areas of each picture in both conditions (see Fig. 3.22). Participants having the greatest similarity in dwell time between perception and imagery showed the best memory for the size of each animal. According to Kosslyn’s theoretical position, much processing associated with visual imagery occurs in early visual cortex (BA17 and BA18) plus several other areas. In a review, Kosslyn and Thompson (2003) found 50% of studies using visual-imagery tasks reported activation in early visual cortex. Significant findings were most likely when the task involved inspecting the fine details of images or focusing on an object’s shape. In a meta-analysis (see Glossary), Winlove et al. (2018) found the early visual cortex (V1) was typically activated during visual imagery. Consistent with Kosslyn’s theory, activation in the early visual cortex is greater among individuals reporting vivid visual imagery. The neuroimaging evidence discussed above is limited – it is correlational and so the activation associated with visual imagery may not be directly relevant to the images that are formed. Naselaris et al. (2015) reported more convincing evidence. Participants formed images of five artworks. It was possible to some extent to identify the imagined artworks from hundreds of other artworks through careful analysis of activity in the early visual cortex. Some of this activity corresponded to the processing of low-level visual features (e.g., space; orientation).

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 133

28/02/20 6:43 PM

134

Visual perception and attention

Further neuroimaging support for the notion that imagery closely resembles perception was reported by Dijkstra et al. (2017a). They found “the overlap in neural representations between imagery and perception . . . extends beyond the visual cortex to include also parietal and premotor/ frontal areas” (p. 1372). Of most importance, the greater the neural overlap between imagery and perception throughout the entire visual system, the more vivid was the imagery experience.

Imagery does not resemble perception Look at Figure 3.23. Start with the object on the left and form a clear image of it. Then close your eyes, mentally rotate the image by 90o clockwise and decide what you see. Then repeat the exercise with the other objects. Finally, rotate the book through 90o. You probably found it very easy to identify the objects when perceiving them but impossible when only imagining rotating them. Slezak (1991, 1995) used stimuli closely resembling those in Figure 3.23 and found no observers reported seeing the objects. Thus, the information within images is much less detailed and flexible than visual information. Lee et al. (2012) identified important differences between imagery and perception. Observers viewed or imagined common objects (e.g., car; umbrella) while activity in the early visual cortex and areas associated with later visual processing (object-selective regions) was assessed. Attempts were made by the researchers to identify the objects being imagined or perceived on the basis of activation in those areas. What did Lee et al. (2012) find? First, activation in all brain areas was  considerably greater when participants perceived rather than imagined  objects. Second, objects being perceived or imagined were identified with above-chance accuracy based on patterns of brain activation except for imagined objects in the primary visual cortex (V1; see Figure 3.24). Third, the success rate in identifying perceived objects was greater based on brain activation in areas associated with early visual processing than those associated with later processing. However, the opposite was the case with imagined objects (see Figure 3.24). Thus, object processing in the early visual cortex is very limited during imagery but is extremely important during perception. Imagery for objects depends mostly on top-down processes based on object knowledge rather than processing in the early visual cortex. Figure 3.23 Most cognitive neuroscience research has Slezak (1991, 1995) asked participants to memorise one of focused on the brain areas activated during the above images. They then imagined rotating the image 90 degrees clockwise and reported what they saw. None of them visual perception and imagery. It is also reported seeing the figures that can be seen clearly if you important to focus on connectivity between rotate the page by 90 degrees clockwise. brain areas. Dijkstra et al. (2017b) considered Left image from Slezak (1995), centre image from Slezak (1991), right connectivity among four brain areas of central image reprinted from Pylyshyn (2002), with permission from Elsevier and the author. importance in perception and imagery: early

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 134

28/02/20 6:43 PM



Object and face recognition

##

##

**

60

**

Classification performance (%)

Classification performance (%)

**

**

40 20 0

Figure 3.24 The extent to which perceived (left side of figure) or imagined (right side of figure) objects could be classified accurately on the basis of brain activity in the early visual cortex and object-selective cortex. ES =extrastriate retinotopic cortex; LO = lateral occipital cortex; pFs = posterior fusiform sulcus.

##

##

V1 ES Retinotopic

20 **

**

* 10

LO pFs Objectselective

0

Chance

V1 ES Retinotopic

From S.H. Lee et al. (2012). Reproduced with permission from Elsevier.

LO pFs Objectselective

7

Bottom-up

IPS IFG

OCC

Probability density

(a)

FG

6 5 4 3 2 1 0 –0.5

Perception

0

0.5 1 1.5 Posterior estimate

135

2

Imagery

Figure 3.25 Connectivity during perception and imagery involving (a) bottom-up processing; and (b) top-down processing. Posterior estimates indicate connectivity strength (the further from 0 the stronger). The meanings of OCC, FG, IPS and IFG are given in the text. From Dijkstra et al. (2017b).

(b)

Top-down

IPS IFG

OCC

FG

Probability density

6 5 4 3 2 1 0 –0.5

0

0.5 1 1.5 Posterior estimate

2

visual cortex (OCC), fusiform gyrus (FG; late visual cortex), IPS (intraparietal sulcus) and IFG (inferior frontal gyrus). The first two are mostly associated with bottom-up processing whereas the second two are mostly associated with top-down processing. Dijkstra et al.’s (2017b) key findings are shown in Figure 3.25. First, perception was associated with reasonably strong bottom-up brain connectivity and weak top-down brain connectivity. Second, imagery was associated with non-significant bottom-up connectivity but very strong top-down connectivity. Thus, top-down connectivity from frontal to early visual areas

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 135

28/02/20 6:43 PM

136

Visual perception and attention

is a common mechanism during perception and imagery. However, there is much stronger top-down connectivity during imagery to compensate for the absence of bottom-up connectivity. Individuals having the greatest topdown connectivity during imagery reported the most vivid images. Dijkstra et al. (2018) studied the time course for the development of visual representations in perception and in imagery using magneto-­ encephalography (MEG; see Glossary). With perception, they confirmed that visual representations develop through a series of processing stages (see Chapter 2). With imagery, in contrast, the entire visual representation appeared to be activated simultaneously, presumably because all the relevant information was retrieved together from memory.

Brain damage If visual perception and visual imagery involve the same mechanisms, we might expect brain damage to have comparable effects on perception and imagery. That is often the case. However, there are numerous exceptions (Bartolomeo, 2002, 2008). Moro et al. (2008) studied two brain-damaged patients with intact visual perception but impaired visual imagery. They were both very poor at drawing objects from memory but could copy the same objects when shown a drawing. These patients (and others with impaired visual imagery but intact visual perception) have damage to the left temporal lobe. Visual images are probably generated from information about concepts (including objects) stored in the temporal lobes (Patterson et al., 2007). However, this generation process is less important for visual perception. Bridge et al. (2012) studied a young man, SBR, who had virtually no primary visual cortex and nearly total blindness. However, he had vivid visual imagery and his pattern of cortical activation when engaged in visual imagery resembled that of healthy controls. Similar findings were reported with a 70-year-old woman, SH, who became blind at the age of 27. She had intact visual imagery predominantly involving areas outside the early visual cortex. Of relevance, she had greater connectivity between some visual networks in the brain than most individuals. How can we interpret the above findings? Visual perception mostly involves bottom-up processes triggered by the stimulus whereas visual imagery primarily involves top-down processes based on object knowledge. Thus, it is unsurprising brain areas involved in early visual processing are more important for perception than imagery whereas brain areas associated with storage of information about visual objects are more important for imagery.

Evaluation Much progress has been made in understanding the relationship between visual imagery and visual perception. Similar processes are involved in imagery and perception and they are both associated with somewhat similar patterns of brain activity. In addition, the predicted facilitatory and ­interfering effects between imagery and perception tasks have been reported. These findings are more consistent with Kosslyn’s theory than Pylyshyn’s.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 136

28/02/20 6:43 PM



Object and face recognition

137

On the negative side, visual perception and visual imagery are less similar than assumed by Kosslyn. For example, there is the neuroimaging evidence reported by Lee et al. (2012) and the frequent dissociations between perception and imagery found in brain-damaged patients. Of most importance, visual perception involves strong bottom-up connectivity and weak top-down connectivity, whereas visual imagery involves very strong top-down connectivity but negligible bottom-up connectivity (Dijkstra et al., 2017b).

CHAPTER SUMMARY •

Pattern recognition. Pattern recognition involves processing of specific features and global processing. Feature processing ­generally (but not always) precedes global processing. Several types of cells (e.g., simple cells; complex cells; end-stopped cells) are involved in feature processing. There are complexities in pattern recognition due to interactions among cells and the ­influence of top-down processes. Evidence from computer ­programs to solve CAPTCHAs suggests humans are very good at processing edge corners. Fingerprint identification is sometimes very accurate; however, even experts show confirmation bias (distorted performance caused by contextual information). Fingerprint experts are much better than novices at discriminating between matches and non-matches and also adopt a more conservative response bias.



Perceptual organisation. The gestaltists proposed several principles of perceptual grouping and emphasised the importance of figure-ground segmentation. They argued that perceptual grouping and figure-ground segregation depend on innate factors. They also argued we perceive the simplest possible organisation of the visual field. The gestaltists provided descriptions rather than explanations. Their approach underestimated the complex interactions of factors underlying perceptual organisation. The gestaltists de-emphasised the role of experience and learning in perceptual organisation. However, recent theories based on Bayesian inference (e.g., the Bayesian hierarchical grouping model) have emphasised learning processes and fully acknowledge the importance of learning.



Approaches to object recognition. Visual processing typically involves a coarse-to-fine processing sequence: low spatial frequencies in visual input (associated with coarse processing) are conveyed to higher visual areas faster than high spatial frequencies (associated with fine processing). Biederman assumed in his recognitionby-components theory that objects consist of geons (basic shapes). An object’s geons are determined by edge-extraction processes and the resultant geon-based description is viewpoint-invariant. Biederman’s theory de-emphasises the role of top-down processes.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 137

28/02/20 6:43 PM

138

Visual perception and attention

Object recognition is sometimes viewpoint-invariant (as predicted by Biederman) with easy categorical discriminations, but it is more typically viewer-centred when identification is required. Object representations often contain viewpoint-dependent and viewpoint-invariant information. •

Object recognition: top-down processes. Top-down processes are more important in object recognition when observers view degraded or briefly presented stimuli. Topdown processes sometimes influence attention, memory or response bias rather than perception itself. However, there are also direct effects of top-down processes on object recognition. According to the interactive-iterative framework (Baruch et al., 2018), top-down and bottom-up processes interact with top-down processes (e.g., attention) influencing subsequent bottom-up processing.



Face recognition. Face recognition involves more holistic processing than object recognition. Deficient holistic processing partly explains why prosopagnosic patients have much greater problems with face recognition than object recognition. Face processing involves a brain network including the fusiform face and occipital face areas. However, much of this network is also used in processing other objects (especially when recognising objects for which we have expertise). Bruce and Young’s model assumes several serial processing stages. Research on prosopagnosics supports this assumption because the precise nature of their face-recognition impairments depends on which stage(s) are most affected. The model also assumes there are major differences in the processing of familiar and unfamiliar faces. This assumption has received substantial support. However, Bruce and Young did not fully appreciate that unfamiliar faces are hard to recognise because of the great variability of any given individual’s facial images. The model assumes there are two independent processing routes (for facial expression and facial identity), but they are not entirely independent. The model ignores the role played by genetic factors in accounting for individual differences in face-recognition ability.



Visual imagery. Visual imagery allows us to predict the visual consequences of performing certain actions. According to Kosslyn’s perceptual anticipation theory, visual imagery closely resembles visual perception. In contrast, Pylyshyn, in his propositional theory, argued visual imagery involves making use of tacit knowledge and does not resemble visual perception. Visual imagery and perception influence each other as predicted by Kosslyn’s theory. Neuroimaging studies and studies on braindamaged patients indicate similar areas are involved in imagery

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 138

28/02/20 6:43 PM



Object and face recognition

139

and perception. However, areas involved in top-down processing (e.g., left temporal lobe) are more important in imagery than perception, and areas involved in bottom-up processing (e.g., early visual cortex) are more important in perception. More generally, bottom-up brain connectivity is far more important in perception than imagery, whereas top-down brain connectivity is far more important in imagery than perception.

FURTHER READING Baruch, O., Kimchi, R. & Goldsmith, M. (2018). Attention to distinguishing features in object recognition: An interactive-iterative framework. Cognition, 170, 228–244. Orit Baruch and colleagues provide a theoretical framework for understanding how bottom-up and top-down processes interact in object recognition. Dijkstra, N., Zeidman, P., Ondobaka, S., van Gerven, M.A.J. & Friston, K. (2017b). Distinct top-down and bottom-up brain connectivity during visual perception and imagery. Scientific Reports, 7 (Article 5677). In this article, Nadine Dijkstra and her colleagues clarify the roles of top-down and bottom-up processes in visual perception and imagery. Firestone, C. & Scholl, B.J. (2016). Cognition does not affect perception: Evaluating the evidence for “top-down” effects. Behavioral and Brain Sciences, 39, 1–77. The authors argue that top-down processes do not directly influence visual perception. Read the open peer commentary following the article, however, and you will see most experts disagree. Gauthier, I. & Tarr, M.J. (2016). Visual object recognition: Do we (finally) know more now than we did? Annual Review of Vision Science, 2, 377–396. Isabel Gauthier and Michael Tarr provide a comprehensive overview of theory and research on object recognition. Grill-Spector, K., Weiner, K.S., Kay, K. & Gomez, J. (2017). The functional neuroanatomy of human face perception. Annual Review of Vision Science, 3, 167–196. This article by Kalanit Grill-Sector and colleagues contains a comprehensive account of brain mechanisms underlying face perception. Wagemans, J. (2018). Perceptual organisation. In J.T. Serences (ed.), Stevens’ Handbook of Experimental Psychology and Cognitive Neuroscience, Vol. 2: Sensation, Perception, and Attention (4th edn; pp. 803–822). New York: Wiley. Johan Wagemans reviews various theoretical and empirical approaches to understanding perceptual organisation. Young, A.W. (2018). Faces, people and the brain: The 45th Sir Frederic Bartlett lecture. Quarterly Journal of Experimental Psychology, 71, 569–594. Andy Young provides a very interesting account of theory and research on face perception.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 139

28/02/20 6:43 PM

Chapter

4

Motion perception and action

INTRODUCTION Most research on perception discussed in previous chapters involved presenting a visual stimulus and assessing aspects of its meaning. What was missing (but is an overarching theme of this chapter) is the time dimension. In the real world, we move around and/or people or objects in the environment move. The resulting changes in the visual information available to us are very useful in ensuring we perceive the environment accurately and also respond appropriately. This emphasis on change and movement necessarily leads to a consideration of the relationship between perception and action. In sum, our focus in this chapter is on how we process (and respond to) a constantly changing environment. The first theme addressed in this chapter is the perception of movement. This includes our ability to move successfully within the visual environment and predict accurately when moving objects will reach us. The second theme is concerned with more complex issues – how do we act appropriately on the environment and the objects within it? Of relevance are theories (e.g., the perception-action theory; the dual-­ process approach) distinguishing between processes and systems involved in visionfor-­ perception and those involved in vision-for-action (see Chapter  2). Here we consider theories providing more detailed accounts of vision-foraction and/or the workings of the dorsal pathways allegedly underlying vision-for-action. The third theme focuses on the processes involved in making sense of moving objects (especially other people). It thus differs from the first theme in which moving stimuli are considered mostly in terms of predicting when they will reach us. There is an emphasis on the perception of biological movement when the available visual information is impoverished. We also consider the role of the mirror neuron system in interpreting human movement. Finally, we consider our ability (or failure!) to detect changes in objects within the visual environment over time. Unsurprisingly, attention importantly determines which aspects of the environment are consciously

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 140

28/02/20 6:43 PM



detected. This issue provides a useful bridge between the areas of visual perception and attention (the subject of the next chapter).

DIRECT PERCEPTION James Gibson (1950, 1966, 1979) put forward a radical approach to visual perception that was largely ignored at the time. Until approximately 40 years ago, it was assumed the main purpose of visual perception is to allow us to identify or recognise objects. This typically involves relating information extracted from the visual environment to our stored knowledge of objects (see Chapter 3). Gibson argued that this approach is limited – in evolutionary terms, vision developed so our ancestors could respond rapidly to the environment (e.g., hunting animals; escaping from danger). Gibson (1979, p. 239) argued that perception involves “keeping in touch with the environment”. This is sufficient for most purposes because the information provided by environmental stimuli is much richer than previously believed. We can relate Gibson’s views to Milner and Goodale’s (1995, 2008) vision-for-action system (see Chapter 2). According to both theoretical accounts, there is an intimate relationship between perception and action. Gibson regarded his theoretical approach as ecological. He emphasised that perception facilitates interactions between the individual and their environment. Here is the essence of his direct theory of perception: When I assert that perception of the environment is direct, I mean that it is not mediated by retinal pictures, neural pictures, or mental pictures. Direct perception is the activity of getting information from the ambient array of light. I call this a process of information pickup that involves . . . looking around, getting around, and looking at things. (Gibson, 1979, p. 147) We will briefly consider some of Gibson’s theoretical assumptions: ●●

●●

141

Motion perception and action

The pattern of light reaching the eye is an optic array. It contains all the visual information from the environment striking the eye. The optic array provides unambiguous or invariant information about the layout of objects. This information comes in many forms including optic flow patterns and affordances (see below) and texture gradients (discussed in Chapter 2).

Gibson produced training films in the Second World War describing how pilots handle taking off and landing. Of crucial importance is optic flow – the changes in the pattern of light reaching observers when they move or parts of the visual environment move. When pilots approach a landing strip, the point towards which they are moving (focus of expansion) appears motionless with the rest of the visual environment apparently moving away from that point (see Figure 4.1). The further away any part of the landing strip is from that point, the greater is its apparent speed of movement. Wang et al. (2012) simulated the pattern of optic flow that would be experienced if individuals moved forwards in a stationary environment.

KEY TERMS Optic array The structural pattern of light falling on the retina. Optic flow The changes in the pattern of light reaching an observer when there is movement of the observer and/or aspects of the environment. Focus of expansion The point towards which someone in motion is moving; it does not appear to move.

Case study: Gibson's theory of direct perception affordances

142

Visual perception and attention

Figure 4.1 The optic-flow field as a pilot comes in to land, with the focus of expansion in the middle. From Gibson (1950). Wadsworth, a part of Cengage Learning, Inc. 2014 American Psychological Association. Reproduced with permission.

Their attention was attracted towards the focus of expansion, thus showing its psychological importance. (More is said later about optic flow and the focus of expansion.) Gibson (1966, 1979) argued certain higher-order characteristics of the visual array (invariants) remain unaltered as observers move around their environment. Invariants (e.g., the focus of expansion) are important because they remain the same over different viewing angles. The focus of expansion is an invariant feature of the optic array.

Affordances

KEY TERMS Invariants Properties of the optic array that remain constant even though other aspects vary; part of Gibson’s theory. Affordances The potential uses of an object which Gibson claimed are perceived directly.

According to Gibson (1979), the potential uses of objects (their ­affordances) are directly perceivable. For example, a ladder “affords” ascent or descent. Gibson believed that “affordances are opportunities for action that exist in the environment and do not depend on the animal’s mind . . . they do not cause behaviour but simply make it possible” (Withagen et al., 2012, p. 251). In Gibson (1979, p. 127), affordances are what the environment “offers the animal, what it provides or furnishes”. Evidence for the affordance of “climbability” of steps varying in height was reported by Di Stasi and Guardini (2007). The step height judged the most “climbable” was the one that would have involved the minimum expenditure of energy. Gibson argued an object’s affordances are perceived directly or automatically. In support, Pappas and Mack (2008) found images of objects presented below the level of conscious awareness nevertheless produced motor priming. For example, the image of a hammer caused activation in brain areas involved in preparing to use a hammer. Wilf et al. (2013) focused on the affordance of graspability with participants lifting their arms to perform a reach-like movement with graspable and non-graspable objects (see Figure 4.2). Muscle activity started faster for graspable than non-graspable objects suggesting that the affordance of graspability triggers rapid activity in the motor system.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 142

28/02/20 6:43 PM



143

Motion perception and action

Nongraspable

Graspable

Nongraspable

Graspable

Figure 4.2 Graspable and nongraspable objects having similar asymmetrical features. From Wilf et al. (2013). Reprinted with permission.

Gibson’s approach to affordances is substantially oversimplified. For example, an apparently simple task such as cutting up a tomato involves selecting an appropriate tool, deciding on how to grasp and manipulate the tool, and monitoring movement execution (Osiurak & Badets, 2016). In other words, “People reason about physical object properties to solve everyday life activities” (Osiurak & Badets, 2016, p. 540). This is sharply different to Gibson’s emphasis on the ease and immediacy of tool use. When individuals observe a tool, Gibson assumed this provided them with direct access to knowledge about how to manipulate it and this manipulation knowledge gave access to the tool’s functions. This assumption exaggerates the importance of manipulation knowledge. For example,

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 143

28/02/20 6:43 PM

144

Visual perception and attention

Garcea and Mahon (2012) found function judgements about tools were made faster than manipulation judgements, whereas Gibson’s approach implies that manipulation judgements should have been faster. Finally, Gibson argued stored knowledge is not required for individuals to make appropriate movements with respect to objects ­ (e.g., tools). In fact, individuals often make extensive use of motor and ­function ­knowledge when dealing with objects (Osiurak & Badets, 2016). For example, making tea involves filling the kettle with water, boiling the water, finding some milk and so on. Foulsham (2015) discussed research showing there are only small individual differences in the pattern of eye ­fixations when people make tea. Such findings strongly imply they use stored information about the sequence of motor actions involved in tea-making.

Evaluation What are the strengths of Gibson’s ecological approach? First, “Gibson’s realisation that natural scenes are the ecologically valid stimulus that should be used for the study of vision was of fundamental importance” (Bruce & Tadmor, 2015, p. 32). Second, and related to the first point, Gibson disagreed with the previous emphasis on static observers looking at static visual displays. Foulsham and Kingstone (2017) compared the eye fixations of participants walking around a university campus with those of other participants viewing static pictures of the same scene. The eye fixations were significantly different: those engaged in walking focused more on features (e.g., the path) important for locomotion whereas those viewing static pictures focused centrally within each picture. Third, Gibson was far ahead of his time. There is support for two visual systems (Milner & Goodale, 1995, 2008; see Chapter 2): a visionfor-­perception system and a vision-for-action system. Before Gibson, the major emphasis was on the former. In contrast, he argued our perceptual system allows us to respond rapidly and accurately to environmental stimuli without using memory, which is a feature of the latter system. What are the limitations of Gibson’s approach? First, Gibson attempted to specify the visual information used to guide action but ignored many of the processes involved (see Chapters 2 and 3). For example, Gibson assumed the perception of invariants occurred almost “automatically”, but it actually requires several complex processes. Second, Gibson’s argument that we do not need to assume the existence of internal representations (e.g., object knowledge) is flawed. The logic of Gibson’s position is that: “There are invariants specifying a friend’s face, a performance of Hamlet, or the sinking of the Titanic, and no knowledge of the friend, of the play, or of maritime history is required to perceive these things” (Bruce et al., 2003, p. 410). Evidence refuting Gibson’s argument was reviewed by Foulsham (2015; discussed above). Third, and related to the second point, Gibson de-emphasised the role of top-down processes (based on our knowledge and expectations) in visual perception. Such processes are especially important when the visual input is impoverished (see Chapter 3).

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 144

28/02/20 6:43 PM



145

Motion perception and action

Fourth, Gibson’s views on the effects of motion on perception were oversimplified. For example, when moving towards a goal, we use more information sources than Gibson assumed (discussed below).

VISUALLY GUIDED MOVEMENT From an ecological perspective, it is important to understand how we move around the environment. For example, what information do we use when walking towards our current goal or target? We must ensure we are not hit by cars when crossing the road and when driving we must avoid hitting other cars. Playing tennis well involves predicting exactly when and where the ball will strike our racquet. The ways visual perception plays a crucial role in facilitating our locomotion and ensuring our safety are discussed in the next section.

Heading and steering

KEY TERMS Retinal flow field The changing patterns of light on the retina produced by movement of the observer relative to the environment as well as by eye and head movements. Efference copy An internal copy of a motor command (e.g., to the eyes); it can be used to identify movement within the retinal image that is not due to object movement in the environment.

When we want to reach some goal (e.g., a gate at the end of a field), we use visual information to move directly towards it. Gibson (1950) emphasised the importance of optic flow (see Glossary; discussed on pp. 141–142). When we move forwards in a straight line, the point towards which we are moving (the focus of expansion) appears motionless. In contrast, the area around that point seems to expand. Gibson (1950) proposed a hypothesis, according to which, if we are not moving directly towards our goal, we use the focus of expansion and optic flow to bring our heading (point of expansion) into alignment with our goal. This is known as the global radial outflow hypothesis. Gibson’s approach works well in principle when applied to an individual trying to move straight from A to B. However, matters are more complex when we cannot move directly to our goal (e.g., driving around a bend in the road; avoiding obstacles). Another complexity is that observers often make head and eye movements. In sum, the retinal flow field (changes in the pattern of light on the retina) is influenced by rotation in the retinal image produced by following a curved path and/or eye and head movements. The above complexities mean it is often hard to use information from retinal flow to determine our direction of heading. It has often been claimed that a copy of motor commands (preprogramming) to move the eye and head (efference copy) is used by observers to compensate for the effects of eye and head movements on the retinal image. However, Feldman (2016) argued this approach is insufficient on its own because it de-emphasises the brain’s active involvement in relating perception and action.

Findings: heading Gibson emphasised the role of optic flow in allowing individuals to move directly towards their goal. Relevant information includes the focus of expansion (see Glossary) and the direction of radial motion (e.g., expansion within optic flow). Strong et al. (2017) obtained evidence indicating the importance of both factors and also established they depend on separate brain areas. More specifically, they used transcranial magnetic stimulation

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 145

28/02/20 6:43 PM

146

Visual perception and attention

(TMS; see Glossary) to disrupt key brain areas. TMS applied to area V3A impaired perception of the focus of expansion but not direction of radial motion, with the opposite pattern being obtained when TMS was applied to the motion area V5/MT+ (see Chapter 2). As indicated above, eye and/or head movements make it harder to use optic flow effectively for heading. Bremmer et al. (2010) considered this issue in macaque monkeys presented with distorted visual flow fields simulating the combined effects of self-motion and an eye movement. Their key finding was that numerous cells in the medial superior temporal area successfully compensated for this distortion. According to Gibson, a walker tries to make the focus of expansion coincide with the body moving straight ahead. If a walker wore prisms producing a 9° error in their perceived visual direction, the focus of expansion should be misaligned compared to their expectation. As a result, there should be a correction process, a prediction confirmed by Herlihey and Rushton (2012). Also as predicted, walkers denied access to information about retinal motion failed to show any correction. Factors additional to the optic flow information emphasised by Gibson are also used when making heading judgements. This is unsurprising given the typical richness of the available environmental information. van den Berg and Brenner (1994) noted we only require one eye to use optic flow information. However, they discovered heading judgements were more accurate when observers used both eyes. Binocular disparity (see Glossary) in the two-eye condition provided useful additional information about the relative depths of objects. Cormack et al. (2017) introduced the notion of a binoptic flow field to describe the 3-D information available to observers (but de-emphasised by Gibson). Gibson assumed optic-flow patterns generated by self-motion are of fundamental importance when we head towards a goal. However, motion is not essential for accurate perception of heading. The judgements of heading direction made by observers viewing two static photographs of a real-world scene in rapid succession were reasonably accurate in the absence of opticflow information (Hahn et al., 2003). These findings can be explained in terms of retinal displacement – objects closer to the direction of heading show less retinal displacement as we move closer to the target. Snyder and Bischof (2010) argued that information about the direction of heading is provided by two systems. One system uses movement information (e.g., optic flow) rapidly and fairly automatically (as proposed by Gibson). The other system uses displacement information more slowly and requires greater processing resources. It follows that performing a second task at the same time as making judgements about direction of heading should have little effect on those judgements if movement information is available. In contrast, a second task should impair heading judgements when only displacement information is available. The evidence supported both predictions.

Heading: future path Wilkie and Wann (2006) argued judgements of heading (the direction in which someone is moving) are of little relevance if they are moving along a

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 146

28/02/20 6:43 PM



Motion perception and action

147

curved path. With curved paths, path judgements (identifying future points along one’s path) were much more accurate than heading judgements. According to the above analysis, we might expect individuals (e.g., drivers) to fixate some point along their future path when it is curved. This is the future path strategy. In contrast, Land and Lee (1994) argued (with supportive evidence) that drivers approaching a bend focus on the tangent point – the point on the inside edge of the road at which its direction appears to reverse (see Figure 4.3). The tangent point has two potential advantages. First, it is easy to identify and track. Second, road curvature can easily be worked out by considering the angle between the direction of heading and the tangent point. Kandil et al. (2009) found most drivers negotiating 270° bends at a motorway junction fixated the tangent point much more often than the future path (75% vs 14%, respectively). Other research suggests the tangent point is less important. For example, Itkonen et al. (2015) instructed drivers to “drive as they Road position normally would” or “look at the tangent point”. Eye movements differed markedly in the two conditions – drivers were much more Figure 4.3 likely to fixate points along the future path in The visual features of a road viewed in perspective. The the former condition. tangent point is marked by the filled circle on the inside edge How can we interpret the above appar- of the road, and the desired future path is shown by the ently inconsistent findings? Lappi et al. (2013) dotted line. According to the future-path theory, drivers should hypothesised drivers often fixate the tangent gaze along the line marked “active gaze”. Wilkie et al. (2010). Reprinted with permission from point when approaching and entering a bend From Springer-Verlag. but fixate the future path further into the bend. They argued the tangent point provides relatively precise information and so drivers use it when uncertainty about the precise nature of the curve or bend is maximal (i.e., when approaching and entering it). Lappi et al. (2013) obtained supporting evidence for the above hypothesis. Drivers’ fixations while driving along a lengthy curve formed by the slip road to a motorway were predominantly on the path ahead rather than the tangent point after the first few seconds (see short clips of drivers’ eye movements while performing this task at 10.1371/journal.pone.0068326). KEY TERM The evidence discussed so far does not rule out optic flow as a factor Tangent point influencing drivers’ steering. Mole et al. (2016) manipulated optic-flow From a driver’s speed in a simulated driving situation. This produced steering errors perspective, the point (understeering or oversteering) when going around bends even when full on a road at which the direction of its inside information about road edges was available. Thus, optic flow influenced edge appears to reverse. driving performance.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 147

28/02/20 6:43 PM

148

Visual perception and attention

IN THE REAL WORLD: ON-ROAD DRIVING Much research on drivers’ gaze patterns lacks ecological validity (see Glossary). Drivers are typically in a simulator and the environment through which they drive is somewhat oversimplified. Accordingly, Lappi et al. (2017) studied the gaze patterns of a 43-year-old male driving school instructor driving on a rural road in Finland. His eye movements revealed a more complex picture than most previous research. What did Lappi et al. (2017) discover? Here are four major findings: (1) The driver’s gaze shifted very frequently from one feature of the visual environment to another and he made many head movements. (2) The driver’s gaze was predominantly on the far road (see Figure 4.4). This preview of the road ahead allowed him to make use of anticipatory control. (3) In bends, the driver’s gaze was mostly within the far road “triangle” formed by the tangent point (TP), the lane edge opposite the TP and the occlusion point (OP; the point where the road disappears from view). In general terms, the OP is used to anticipate the road ahead whereas the TP is used for more immediate compensatory steering control. (4) The driver fixated specific targets (e.g., traffic signs; other road users) very rapidly, suggesting his peripheral vision was very efficient.

(A)

(B)

Figure 4.4 The far road “triangle” in (A) a left turn and (B) a right turn. From Lappi et al. (2017).

In sum, drivers’ gaze patterns are more complex than implied by previous research. Drivers do not constantly fixate any given feature (e.g., tangent point) passively. Instead, they “sample visual information as needed, leading to input that is intermittent, and determined by the active observer . . . rather than imposed by the environment” (Lappi et al., 2017, p. 11). Drivers’ eye movements are determined in part by control mechanisms (e.g., path planning) (Lappi & Mole, 2018). These mechanisms are responsive to drivers’ goals. For example, professional racing drivers have the goal of driving as fast as possible whereas many ordinary drivers have the goal of driving safely.

Evaluation Gibson’s views concerning the importance of optic-flow information have deservedly been very influential. Such information is especially useful when individuals can move directly towards their goal rather than following a curved or indirect path. Indeed, the evidence suggests optic flow is often

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 148

28/02/20 6:43 PM



Motion perception and action

the dominant source of information determining judgements of heading direction. Drivers going around bends use optic-flow information. They also make some use of the tangent point. This is a relatively simple feature of the visual environment and its use by drivers is in the spirit of Gibson’s perspective. What are the limitations of Gibson’s approach and other related approaches? (1) Individuals moving directly towards a target use several kinds of information (e.g., binocular disparity; retinal displacement) ignored by Gibson. (2) The tangent point is used infrequently when individuals move along a curved path: they more often fixate points lying along the future path. (3) Drivers going around bends use a greater variety of information sources than implied by Gibson’s approach. Of most importance, drivers’ eye movements are strongly influenced by active, top-down processes (e.g., motor control) not included within Gibson’s theorising. More specifically, drivers’ eye movements depend on their current driving goals as well as the environmental conditions. (4) Research and theorising have de-emphasised meta-cognition (beliefs about one’s own performance). Mole and Lappi (2018) found drivers often made inaccurate meta-cognitive judgements of their own driving performance (e.g., they tended to exaggerate the importance of driving speed in determining performance). Such inaccurate judgements probably often lead to impaired driving performance.

Time to contact In everyday life, we often need to predict the moment of contact between us and some object. These situations include ones where we are moving towards an object (e.g., a wall) and those in which an object (e.g., a ball) is approaching us. We might work out the time to contact by dividing our estimate of the object’s distance by our estimate of its speed. However, this would be complex and error-prone because information about speed and distance is not directly available. Lee (1976, 2009) argued that there is a simpler way to work out the time to contact or collision. If we approach it (or it approaches us) at constant velocity, we can use tau. Tau is defined as the size of an object’s retinal image divided by its rate of expansion. The faster the rate of ­expansion, the less time there is to contact. When driving, the rate of decline of tau over time (tau-dot) indicates whether there is sufficient braking time to stop before contact or collision. Lee (1976) argued drivers brake to hold constant the rate of change of tau. This tau-dot hypothesis is consistent with Gibson’s approach because it assumes tau-dot is an invariant available to observers from optic flow. Lee’s theoretical approach has been highly influential. However, his emphasis on tau has limited applicability in various ways (Tresilian, 1999). First, tau ignores acceleration in object velocity. Second, tau only provides information about the time to contact or collision with the eyes. Thus, drivers might find the front of their car smashed in if they relied solely on

149

150

Visual perception and attention

tau! Third, tau is accurate only when applied to spherically symmetrical objects: do not rely on it when catching a rugby ball! Harrison et al. (2016) argued that people’s behaviour is often influenced by factors other than their estimate of the time to contact. For example, consider someone deciding whether to cross a road when there is an approaching car. Their decision is often influenced by judgements of their physical mobility and their personality (e.g., cautious or impetuous) (see p. 151).

Findings According to Lee’s (1976) theory, observers can often judge time to contact accurately based on using tau relatively “automatically”. If so, observers’ time-to-contact judgements might not be impaired if they performed a cognitively demanding task while observing an object’s movement. Baurès et al. (2018) obtained support for this prediction. Indeed, time-tocontact judgements were more accurate when observers performed a secondary task, perhaps because this made it less likely they would attend to potentially  misleading information (e.g., expectations about an object’s movements). According to Lee (1976), judgements of the time to contact when catching a ball should depend crucially on the rate of expansion of the ball’s retinal image. Savelsbergh et al. (1993) used a deflating ball having a significantly slower rate of expansion than an ordinary ball. The prediction was that peak grasp closure should occur later to the deflating ball. This prediction was confirmed. However, the actual slowing was much less than predicted (30 ms vs 230 ms). Participants minimised the distorting effects of m ­ anipulating the rate of expansion by using additional sources of information (e.g., depth cues). Hosking and Crassini (2010) had participants judge time to contact for familiar objects (tennis ball and football) presented in their standard size or with their sizes reversed. They also used unfamiliar black spheres. Contrary to Lee’s hypothesis, time-to-contact judgments were influenced by familiar size (especially when the object was a very large tennis ball) leading participants to overestimate time to contact (see Figure 4.5). Tau is available in monocular vision. However, observers often make use of inforFigure 4.5 Errors in time-to-contact judgements for the smaller and mation available in binocular vision, espethe larger object as a function of whether they were presented cially binocular disparity (see Chapter 2). in their standard size, the reverse size (off-size) or lacking Fath et al. (2018) discussed research showing texture (no-texture). Positive values indicate that responses binocular information sometimes provides were made too late and negative values that they were made more accurate judgements than tau of time to too early. contact (e.g., when viewing small objects or From Hosking and Crassini (2010). With kind permission from Springer Science+Business Media. rotating non-spherical objects).

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 150

28/02/20 6:43 PM



Motion perception and action

151

In their own research, Fath et al. (2018) assessed accuracy of time-tocontact judgements when observers viewed fast- or slow-moving objects. They used three conditions varying in the amount of information available to observers: (1) monocular flow information only (permitting assessment of tau); (2) binocular disparity information only; (3) all sources of information available. Fath et al. predicted that binocular disparity information would be less likely to be used with fast-moving objects than slow-moving ones because it is relatively time-consuming to calculate changes in binocular disparity over time. What did Fath et al. (2018) find? First, with fast objects, time-to-­ contact judgements were more accurate with monocular flow information only than with binocular disparity information only. Second, with slow objects, the opposite findings were obtained. Third, accuracy of time-tocontact judgements when all sources of information were available were comparable to accuracy in the better of the single-source conditions with fast and with slow objects. DeLucia (2013) found observers mistakenly predicted a large approaching object would reach them sooner than a closer small approaching object: the size-arrival effect. This effect occurred because observers attached more importance to relative size than tau. We turn now to research on drivers’ braking decisions. Lee’s (1976) notion that drivers brake to hold constant the rate of change of tau was tested by Yilmaz and Warren (1995). They told participants to stop at a stop sign in a simulated driving task. As predicted, there was generally a linear reduction in tau during braking. However, some participants showed large rather than gradual changes in tau shortly before stopping. Tijtgat et al. (2008) found individual differences in stereo vision influenced drivers’ braking behaviour to avoid a collision. Drivers with weak stereo vision started braking earlier than those with normal stereo vision and their peak deceleration also occurred earlier. Those with weak stereo vision found it harder to calculate distances causing them to underestimate the time to contact. Thus, deciding when to brake does not depend only on tau or tau-dot. Harrison et al. (2016) argued that Lee’s (1976) theoretical approach is limited in two important ways when applied to drivers’ braking behaviour. First, it ignores physical limitations in the real world. For example, tau-dot specifies to a driver the deceleration during braking required to avoid collision. However, this strategy will not work if the driver’s braking system makes the required deceleration unachievable. Second, individuals differ in the emphasis they place on minimisation of costs (e.g., preferred safety margin). According to Harrison et al., these limitations suggest drivers’ braking behaviour is influenced by their sensitivity to relevant affordances (possibilities for action) such as their knowledge of the dynamics of the braking system in their car.

Evaluation The notion that tau is used to make time-to-contact judgements is simple and elegant. There is much evidence that such judgements are often strongly influenced by tau. Even when competing factors affect time-to-contact

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 151

28/02/20 6:43 PM

152

Visual perception and attention

judgements, tau often has the greatest influence on those judgements. Tau is also often used when drivers make decisions about when to brake. What are the limitations of theory and research in this area? First, time-to-contact judgements are typically more influenced by tau or tau-dot in relatively uncluttered laboratory environments than naturalistic conditions (Land, 2009). Second, tau is not the only factor determining timeto-contact judgements. As Land (2009, p. 853) pointed out, “The brain will accept all valid cues in the performance of an action, and weight them according to their current reliability.” These cues can include object familiarity, binocular disparity and relative size. It clearly makes sense to use all the available information in this way. Third, the tau hypothesis ignores the emotional value of the approaching object. Time-to-contact judgements are shorter for threatening pictures than neutral ones (Brendel et al., 2012). This makes evolutionary sense – it could be fatal to overestimate how long a very threatening object (e.g., a lion) will take to reach you! Fourth, braking behaviour involves factors additional to tau and tau-dot. For example, there are individual differences in preferred safety margin. Rock et al. (2006) identified an alternative braking strategy in a real-world driving task in which drivers directly estimated the constant ideal deceleration required to stop at a given point.

VISUALLY GUIDED ACTION: CONTEMPORARY APPROACHES The previous section focused mainly on the issue of how we use visual information when moving through the environment. Here we consider similar issues but the emphasis shifts towards processes involved in successful goal-directed action towards objects. For example, how do we reach for a cup of coffee? This issue was addressed by Milner and Goodale (1995, 2008) in their perception-action model (see Chapter 2). Contemporary approaches that have developed and extended the perception-action model are discussed below.

Role of planning: planning-control model

Interactive exercise: Planning control

Glover (2004) proposed a planning-control model of goal-directed action towards objects. According to this model, we initially use a planning system followed by a control system but the two systems often overlap in time. Here are the main features of the two systems: (1) Planning system ●● It is used mostly before the initiation of movement. ●● It selects an appropriate target (e.g., cup of coffee), decides how it should be grasped and works out the timing of the movement. ●● It is influenced by factors such as the individual’s goals, the nature of the target object, the visual context and various cognitive processes. ●● It is relatively slow because it uses much information and is influenced by conscious processes.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 152

28/02/20 6:43 PM



153

Motion perception and action

(2) Control system ●● It is used during the carrying out of a movement. ●● It ensures movements are accurate, making adjustments, if necessary, based on visual feedback. Efference copy (see Glossary) is used to compare actual with desired movement. Proprioception is also involved. ●● It is influenced by the target object’s spatial characteristics (e.g., size; shape; orientation) but not by the surrounding context. ●● It is fairly fast because it uses little information and is not susceptible to conscious influence.

KEY TERM Proprioception An individual’s awareness of the position and orientation of parts of their body.

According to the planning-control model, most errors in human action stem from the planning system. In contrast, the control system typically ensures actions are accurate and achieve their goal. Many visual illusions occur because of the influence of visual context. Since information about visual context is used only by the planning system, responses to visual illusions should typically be inaccurate if they depend on the planning system but accurate if they depend on the control system. There are similarities between the planning-control model and Milner and Goodale’s perception-action model. However, Glover (2004) focused  more on the processing changes occurring during action performance.

Findings Glover et al. (2012) compared the brain areas involved in planning and control using a planning condition (prepare to reach and grasp an object but remain still) and a control condition (reach out immediately for the object). There was practically no overlap in the brain areas associated with planning and control. This finding supports the model’s assumption that planning and control processes are separate. According to the planning-control model, various factors (e.g., semantic properties of the visual scene) influence the planning process associated with goal-directed movements but not the subsequent control process. This prediction was tested by Namdar et al. (2014). Participants grasped an object in front of them using their thumb and index finger. The object had a task-irrelevant digit (1, 2, 8 or 9) on it. As predicted, numerically larger digits led to larger grip apertures during the first half of the movement trajectory but not the second half (involving the control process). According to Glover (2004), action planning involves conscious processing followed by rapid non-conscious processing during action control. These theoretical assumptions can be tested by requiring participants to carry out a second task while performing an action towards an object. According to the model, this second task should disrupt planning but not control. However, Hesse et al. (2012) found a second task disrupted ­planning and control when participants made grasping movements towards objects. Thus, planning and control can both require attentional resources. According to the model, visual illusions occur because misleading visual context influences the initial planning system rather than the later

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 153

28/02/20 6:43 PM

154

Visual perception and attention

control system. Roberts et al. (2013) required participants to make rapid reaching movements to a Müller-Lyer figure. Vision was available only during the first 200 ms of movement or the last 200 ms. The findings were opposite to those predicted theoretically – performance was more accurate with early vision than late vision. Elliott et al. (2017) explained the above findings with their multiple process model. According to this model, performance was good when early vision was available because of a control system known as impulse control. Impulse control “entails an early, and continuing, comparison of expected sensory consequences to perceived sensory consequences to regulate limb direction and velocity during the distance-covering phase of the movement” (p. 108).

Evaluation Glover’s (2004) planning-control model has proved successful in various ways. First, it successfully developed the common assumption that motor movements towards an object involve successive planning and control processes. Second, the assumption cognitive processes are important in action planning is correct. Third, there is evidence (e.g., Glover et al., 2012) that separate brain areas are involved in planning and control. What are the model’s limitations? First, the planning system involves several very different processes: “goal determination; target identification and selection; analysis of object affordances [potential object uses]; timing; and computation of the metrical properties of the target such as its size, shape, orientation and position relative to the body” (Glover et al., 2012, p. 909). This diversity sheds doubt on the assumption there is a single planning system. Second, the model argues control occurs late during object-directed movements and is influenced by visual feedback. However, there appears to be a second control process (called impulse control by Elliott et al., 2017) operating throughout the movement trajectory and not influenced by visual feedback. Third, and related to the second point, the model presents an oversimplified picture of the processes involved in goal-directed action. More specifically, the processing involved in producing goal-directed movements is far more complex than implied by the notion of a planning process followed by a control process. For example, planning and control processes are often so intermixed that “the distinction between movement planning and movement control is blurred” (Gallivan et al., 2018, p. 519). Fourth, complex decision-making processes are often involved when individuals plan goal-directed actions in the real world. For example, when planning, tennis players players must often decide between a simple shot ­minimising energy expenditure and risk or injury or a more ambitious shot that might immediately win the current point (Gallivan et al., 2018). Fifth, the model is designed to account for planning and control processes when only one object is present or of interest. In contrast, visual scenes in everyday life are often far more complex and contain several objects of potential relevance (see below).

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 154

28/02/20 6:43 PM



155

Motion perception and action

Role of planning: changing action plans We all have considerable experience of changing, modifying and abandoning action plans with respect to objects in the environment. How do we resolve competition among action plans? According to Song (2017, p. 1), “Critical is the existence of parallel motor planning processes, which allow efficient and timely changes.” What evidence indicates we often process information about several different potential actions simultaneously? Suppose participants are given the task of reaching rapidly towards a target in the presence of distractors (Song & Nakayama, 2008). On some trials, their reach is initially directed towards the target. On other trials, their initial reach is directed towards a distractor but this is corrected in mid-flight producing a strongly curved trajectory. Song and Nakayama’s key finding was that corrective movements occurred very rapidly following the onset of the initial movement. This finding strongly implies that the corrective movement had been planned prior to execution of the initial incorrect movement. Song (2017) discussed several other studies where similar findings were obtained. He concluded, “The sensori-motor system generates multiple competing plans in parallel before actions are initiated . . . this concurrent processing enables us to efficiently resolve competition and select one appropriate action rapidly” (p. 6).

Brain pathways In their perception-action model, Milner and Goodale (1995, 2008) distinguished between a ventral stream or pathway and a dorsal stream or pathway (see Chapter 2). In approximate terms, the ventral stream is involved in object perception whereas the dorsal stream “is generally considered to mediate the visual guidance of action, primarily in real time” (Milner, 2017, p. 1297). Much recent research has indicated that the above theoretical account is oversimplified (see Chapter 2). Of central importance is the accumulating evidence that there are actually two somewhat separate dorsal streams (Osiurak et al., 2017; Sakreida et al., 2016):

Interactive feature: Primal Pictures’ 3D atlas of the brain

(1) The dorso-dorsal stream: processing in this stream relates to the online control of action and is hand-centred; it has been described as the “grasp” system (Binkofski & Buxbaum, 2013). (2) The ventro-dorsal stream: processing in this stream is offline and relies on memorised knowledge of objects and tools and is object-centred; it has been described as the “use” system (Binkofski & Buxbaum, 2013). Sakreida et al. (2016) identified several other differences between these two streams (see Figure 4.6). In essence, object processing within the dorso-­ dorsal stream is variable because it is determined by the immediately accessible properties of an object (e.g., its size and shape). Such processing is fast and “automatic”. In contrast, processing within the ventro-dorsal stream is stable because it is determined by memorised object knowledge.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 155

28/02/20 6:43 PM

Central sulcus

Dorso-dorsal VARIABLE

From Sakreida et al. (2016). Reprinted with permission of Elsevier.

STABLE Ventro-dorsal

• Fast and “automatic” online processing during actual object interaction • Variation of object properties (e.g., size, shape, weight, or orientation) during task performance • Low working memory load

• Structurebased actions/ “Grasp” system by Buxbaum and Kalénine

• Slow and “non-automatic” “offline” processing of memorised object knowledge • Constant object properties during active or observed object-related reaching, grasping or pointing • High working memory load

• Grasping circuit by Jeannerod

• Reaching circuit by Jeannerod

Related concepts

Figure 4.6 The dorso-dorsal and ventro-dorsal streams showing their brain locations and forms of processing.

Visual perception and attention

Continuum

156

• Functionbased actions/ “Use” system by Buxbaum and Kalénine

Such  processing is slow and more cognitively demanding than processing within the dorso-dorsal stream.

Findings

KEY TERM Limb apraxia A condition caused by brain damage in which individuals have impaired ability to make skilled goal-directed movements towards objects even though they possess the physical ability to perform them.

Considerable neuroimaging evidence supports the proposed distinction between two dorsal streams. Martin et al. (2018, p. 3755) reviewed research indicating the dorso-dorsal stream “traverses from visual area V3a through V6 toward the superior parietal lobule, and . . . reaches the dorsal premotor cortex”. In contrast, the ventro-dorsal stream “encompasses higher-­order visual areas like MT/V5+, the inferior parietal lobule . . . as well as the ventral premotor cortex and inferior frontal gyru” (p. 3755). Sakreida et al. (2016) conducted a meta-analytic review based on 71 neuroimaging studies and obtained similar findings. Evidence from brain-damaged patients is also supportive of the distinction between two dorsal streams. First, we consider patients with damage to the ventro-dorsal stream. Much research has focused on limb apraxia, a disorder where patients often fail to make precise goal-directed actions in spite of possessing the physical ability to perform those actions (Pellicano et al., 2017). More specifically, “Reaching and grasping actions in LA [limb apraxia] are normal when vision of the limb and target is available, but typically degrade when they must be performed ‘off-line’, as when subjects are blindfolded prior to movement execution” (Binkovski & Buxbaum, 2013, p. 5). This pattern of findings is expected if the dorso-­ dorsal stream is intact in patients with limb apraxia. Second, we consider patients with damage to the dorso-dorsal stream. Much research here has focused on optic ataxia (see Glossary). As predicted, patients with optic ataxia have impaired online motor control and so exhibit inaccurate reaching towards (and grasping of) objects.

Evaluation Neuroimaging research has provided convincing evidence for the existence of two dorsal processing streams. The distinction between dorso-dorsal and ventro-dorsal streams has also been supported by studies on brain-damaged

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 156

28/02/20 6:43 PM



Motion perception and action

157

patients. More specifically, there is some evidence for a double dissociation (see Glossary) between the impairments exhibited by patients with limb apraxia and optic ataxia. What are the limitations of research in this area? First, the ventral stream (strongly involved in object recognition) is also important in visually guided action (Osiurak et al., 2017). However, precisely how this stream interacts with the dorso-dorsal and ventro-dorsal streams is unclear. Second, there is some overlap in the brain between the dorso-dorsal and ventro-dorsal streams and so it is important not to exaggerate their independence. Third, there is a lack of consensus concerning the precise functions of the two dorsal streams (see Osiurak et al., 2017, and Sakreida et al., 2016).

PERCEPTION OF HUMAN MOTION We are very good at interpreting other people’s movements. We can decide very rapidly whether someone is walking, running or limping. Our initial focus is on two key issues. First, how successfully can we interpret human motion with very limited visual information? Second, do the processes involved in perception of human motion differ from those involved in perception of motion in general? If the answer to this question is positive, we also need to consider why the perception of human motion is special. As indicated already, our focus is mostly on the perception of human motion. However, there are many similarities between the perception of human and animal motion, and we will sometimes use the term “biological motion” to refer generally to the perception of animal motion. Finally, we discuss an important theoretical approach based on the notion that the same brain system or network is involved in perceiving and understanding human actions and in performing those same actions.

Perceiving human motion Suppose you were presented with point-light displays, as was done initially by Johansson (1973). Actors were dressed entirely in black with lights attached to their joints (e.g., wrists; knees; ankles). They were filmed moving around a darkened room so only the lights were visible to observers watching the film (see Figure 4.7 and “Johansson Motion Perception Part 1” on YouTube).What do you think you would perceive in those circumstances?

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 157

Figure 4.7 Point-light sequences (a) with the walker visible and (b) with the walker not visible. Shiffrar and Thomas (2013). With permission of the authors.

28/02/20 6:43 PM

158

Visual perception and attention

Figure 4.8 Human detection and discrimination efficiency for human walkers presented in contour, point lights, silhouette and skeleton.

Human efficiency (%)

1.2

From Lu et al. (2017).

Detection Discrimination

0.6

0

Contour

Point light

Silhouette

Skeleton

In fact, Johansson found observers perceived the moving person accurately with only six lights and a short segment of film. In subsequent research, Johansson et al. (1980) found observers perceived human motion with no apparent difficulty when viewing a pointlight display for only one-fifth of a second! Ruffieux et al. (2016) studied a patient, BC, who was cortically blind but had a residual ability to process motion. When presented with two point-light displays (one of a human and one of an animal) at the same time, he generally correctly identified the human. The above findings imply we are very efficient at processing impoverished point-light displays. However, Lu et al. (2017) reported some contrary evidence. Observers were given two tasks: (1) detecting the ­presence of a human walker; (2) discriminating whether a human walker was walking leftward or rightward. The walker was presented in point lights,  contour, silhouette or as a skeleton. Detection performance was relatively good for the point-light display but discrimination performance was not (see Figure  4.8). Performance was high with the skeleton display because it provided detailed information about the connections between joints.

Top-down or bottom-up processes? Johansson (1975) argued the ability to perceive biological motion is innate, describing the processes involved as “spontaneous” and “automatic”. Support was reported by Simion et al. (2008) in a study on newborns (1–3 days). These newborns preferred to look at a display showing biological motion more than one that did not. Remarkably, Simion et al. used pointlight displays of chickens of which the newborns had no previous experience. These findings suggest the perception of biological motion involves basic bottom-up processes. Evidence that learning plays a role was reported by Pinto (2006). Three-month-olds were equally sensitive to motion in point-light humans, cats and spiders. In contrast, 5-month-olds were more sensitive to displays of human motion. Thus, the infant visual system becomes increasingly specialised for perceiving human motion.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 158

28/02/20 6:43 PM



Motion perception and action

159

If the detection of biological motion were “automatic”, it would be relatively unaffected by attention. However, in a review Thompson and Parasuraman (2012) concluded attention is required, especially when the available visual information is ambiguous or competing information is present. Mayer et al. (2015) presented circular arrays of between two and eight video clips. In one condition, observers decided rapidly whether any clip  showed human motion; in another condition, they decided whether any clips showed machine motion. There were two key findings. First, d ­ etection times increased with array size for both human and machine motion, ­suggesting attention is required to detect both types of motion.  Second, the effects of array size on detection times were much greater for machine motion. Thus, searching is more efficient for human than machine motion suggesting human motion perception may be special (see below).

Is human motion perception special? Much evidence indicates we are better at detecting human motion than motion in other species (Shiffrar & Thomas, 2013). Cohen (2002) assessed observers’ sensitivity to human, dog and seal motion using point-light displays. Performance was best with human motion and worst with seal motion. Of importance, the same pattern of performance was found in seal trainers and dog trainers. Thus, the key factor is not simply visual experience; instead, we are more sensitive to observed motions resembling our own repertoire of actions. We can also consider whether human motion perception is special by considering the brain. There has been an increasing recognition that many brain areas are involved in biological motion processing (see Figure 4.9). The pathway from the fusiform gyrus (FFG) to the superior temporal sulcus * IFG INS (STS) is of particular importance, as are top-down processes from the insula (INS), the STS and the * 9 STS inferior frontal gyrus (IFG). Much research indicates the central importance Crus I 8 of the superior temporal sulcus. Grossman et al. 6 10 4 11 (2005) applied repetitive transcranial magnetic stimulation (rTMS; see Glossary) to that area to disrupt MTC FFG processing. This caused a substantial reduction in observers’ sensitivity to biological motion. GilaieOCC Dotan et al. (2013) found grey matter volume in the superior temporal sulcus correlated positively with the detection of biological (but not non-biological) motion. Evidence from brain-damaged patients indi- Figure 4.9 Brain areas involved in biological motion processing cates that perceiving biological motion involves dif- (STS = superior temporal sulcus; IFG = inferior frontal ferent processes from those involved in perceiving gyrus; INS = insula; Crus 1 = left lateral cerebellar object motion generally. Vaina et al. (1990) studied lobule; MTC = middle temporal cortex; OCC = early a patient, AF, with damage to the posterior visual visual cortex; FFG = fusiform gyrus). pathways. He performed poorly on basic motion From Sokolov et al. (2018).

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 159

28/02/20 6:43 PM

160

Visual perception and attention

tasks but was reasonably good at detecting biological motion from ­point-light displays. In contrast, Saygin (2007) found in stroke patients with damage in the temporal and premotor frontal areas that their ­perception of biological motion was more impaired than non-biological motion.

Why is biological motion perception special? We could explain the special nature of biological motion perception in three ways (Shiffrar & Thomas, 2013). First, biological motion is the only type of motion humans can produce as well as perceive. Second, most people spend more time perceiving and trying to understand other people’s motion than any other form of visual motion. Third, other people’s movements provide a rich source of social and emotional information. We start with the first reason (discussed further on pp. 161–162). The ­relevance of motor skills to the perception of biological motion was shown by Kloeters et al. (2017). Patients with Parkinson’s disease (which impairs movement execution) had significantly inferior perception of human movement with point-light displays compared to healthy controls. More dramatically, paraplegics with severe spinal injury were almost three times less sensitive than healthy controls to human movement in point-light displays. We must not exaggerate the importance of motor involvement in biological motion perception. A man, DC, born without upper limbs, identified manual actions shown in videos and photographs as well as healthy controls (Vannuscorps et al., 2013). Motor skills may be most important in biological motion perception when the visual information presented is sparse or ambiguous (e.g., as with point-light displays). Jacobs et al. (2004) obtained support for the second reason listed above. Observers’ ability to identify walkers from point-light displays was much better when the walker was observed for 20 hours a week rather than 5 hours. In our everyday lives, we often recognise individuals in motion by integrating information from biological motion with information from the face and the voice within the superior temporal sulcus (Yovel & O’Toole, 2016). Successful integration of these different information sources clearly depends on learning and experience. We turn now to the third reason mentioned earlier. Charlie Chaplin showed convincingly that bodily movements can convey social and emotional information. Atkinson et al. (2004) found observers performed well at identifying emotions from point-light displays (especially for fear, sadness and happiness). Part of the explanation for these findings is that angry individuals walk especially fast whereas fearful or sad ones walk very slowly (Barliya et al., 2013). We can explore the role of social factors in biological motion detection by studying adults with autism spectrum disorder who have severely impaired social interaction skills. The findings are somewhat inconsistent However, adults with autism spectrum disorder generally have a reasonably intact ability to detect human motion in point-light displays but exhibit impaired emotion processing in such displays (see Bakroon & Lakshminarayanan, 2018 for a review).

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 160

28/02/20 6:43 PM



161

Motion perception and action

Mirror neuron system Research on monkeys in the 1990s transformed our PMd understanding of biological motion. Gallese et al. (1996) assessed monkeys’ brain activity while they performed AIP SI M1 a given action and while they observed another monkey PF perform the same action. They found 17% of neurons in PMv area F5 of the premotor cortex were activated in both pMT G/STS conditions. Such findings led theorists to propose a mirror neuron system consisting of neurons activated during both observation and performance of actions (see Keysers et al., 2018, for a review). Visual input There have been numerous attempts to identify a Mirror network mirror neuron system in humans. Our current underMotor output standing of brain areas associated with the mirror neuron system is shown in Figure 4.10. Note that the mirror neuron system consists of an integrated network rather Figure 4.10 than separate brain areas (Keysers et al., 2018). The main brain areas associated with the Most research is limited because it shows only that mirror neuron system (MNS) plus their the same brain areas are involved in action perception interconnections (red). Areas involved in visual and production. Perry et al. (2018) used more precise input (blue; pMTG = posterior mid-temporal methods to reveal a more complex picture within areas gyrus; STS = superior temporal gyrus) and motor assumed to form part of the mirror neuron system. Some output (green; M1 = primary motor cortex) are small areas were activated during both observing actions also shown. AIP = anterior intraparietal areas; PF = area within the parietal lobe; PMv and and imitating them, thus providing evidence for a human PMd = ventral and dorsal premotor cortex; neuron system. However, other adjacent areas were acti- SI = primary somato-sensory cortices. vated only during observing or action imitation. From Keysers et al. (2018). Reprinted with permission More convincing evidence for a human mirror of Elsevier. neuron system was reported by de la Rosa et al. (2016). They focused on activation in parts of the inferior frontal gyrus (BA44/45) corresponding to area F5 in monkeys. Their key finding was that 52 voxels (see Glossary) within BA44/45 responded to both action perception and action production. Before proceeding, we should note the term “mirror neuron system” is somewhat misleading because mirror neurons do not provide us with an exact motoric coding of observed actions. As Williams (2013, p. 2962) wittily remarked, “If only this was the case! I could become a Olympic iceskater or a concert pianist!”

Findings We have seen that neuroimaging studies have indicated that the mirror neuron system is activated during motor perception and action. Such evidence is correlational, and so does not demonstrate that the mirror neuron system is necessary for motor perception and action understanding. More direct evidence comes from research on brain-damaged patients. Binder et al. (2017) studied left-hemisphere stroke patients with apraxia (impaired ability to perform planned actions) having damage within the mirror neuron system (e.g., inferior frontal gyrus). These patients had comparable deficits in action imitation, action recognition and action

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 161

KEY TERM Mirror neuron system Neurons that respond to actions whether performed by oneself or someone else; it is claimed these neurons assist in imitating (and understanding) the actions of others.

28/02/20 6:43 PM

162

KEY TERM Apraxia A condition caused by brain damage in which there is greatly reduced ability to perform purposeful or planned bodily movements in spite of the absence of muscular damage.

Visual perception and attention

comprehension. The co-existence of these deficits was precisely as predicted. Another predicted finding was that left-hemisphere stroke patients without apraxia had less brain damage in core regions of the mirror neuron system than those with apraxia. Another approach to demonstrating the causal role of the mirror neuron system is to use experimental techniques such as transcranial direct current stimulation (tDCS; see Glossary). Avenanti et al. (2018) assessed observers’ ability to predict which object would be grasped after seeing the start of a reaching movement. Task performance was enhanced when anodal tDCS was used to facilitate neural activity within the mirror neuron system, whereas it was impaired when cathodal tDCS was used to inhibit such neural activity.

Findings: functions of the mirror neuron system What are the functions of the mirror neuron system? It has often been assumed mirror neurons play a role in working out why someone else is performing certain actions as well as deciding what those actions are. For example, Eagle et al. (2007, p. 131) claimed the mirror neuron system is involved in “the automatic, unconscious, and non-inferential simulation in the observer of the actions, emotions, and sensations carried out and expressed by the observed”. Rizzolatti and Sinigaglia (2016) argued that full understanding of another person’s actions requires a multi-level process. The first level involves identifying the outcome of the observed action and the emotion being displayed by the other person. This is followed by the observer representing the other person’s desires, beliefs and intentions. The mirror neuron system is primarily involved at the first level but may provide an input to subsequent processes. Lingnau and Petris (2013) argued that understanding another person’s actions often requires complex cognitive processes as well as simpler processes within the mirror neuron system. Observers saw point-light displays of human actions and some were asked to identify the goal of each action. Areas within the prefrontal cortex (associated with high-level cognitive processes) were more activated when goal identification was required. These findings can be explained within the context of Rizzolatti and Sinigaglia’s (2016) approach discussed above. Wurm et al. (2016) distinguished between two forms of motion perception and understanding. They used the example of observers understanding that someone is opening a box. If they have a general or abstract understanding of this action, their understanding should generalise to other boxes and other ways of opening a box. In contrast, if they only have a specific or concrete understanding of the action, their understanding will not generalise. Wurm et al. (2016) found specific or concrete action understanding could occur within the mirror neuron system. However, ­ more general or abstract understanding involved high-level perceptual regions (e.g., the lateral parieto-temporal cortex) outside the mirror neuron system. In sum, the mirror neuron system is of central importance with respect to some (but not all) aspects of action understanding. More specifically,

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 162

28/02/20 6:43 PM



Motion perception and action

163

additional (more “cognitive”) brain areas are required if action understanding is complex (Lingnau & Petris, 2013) or involves generalising from past experience (Wurm et al., 2016). It is also likely that imitating someone else’s actions often involves processes (e.g., person-perception processes) additional to those directly involving the mirror neuron system (Ramsey, 2018).

Overall evaluation Several important research findings have been obtained. First, we have an impressive ability to perceive human or biological motion even with very limited visual input. Second, the brain areas involved in human motion perception differ somewhat from those involved in perceiving motion in general. Third, perception of human motion is special because it is the only type of motion we can both perceive and produce. Fourth, a mirror neuron system allows us to imitate and understand other people’s movements. Fifth, the core brain network of the mirror neuron system has been identified. Its causal role has been established through studies on brain-damaged patients and research using techniques such as transcranial direct current stimulation. What are the limitations of research in this area? First, much remains unclear about interactions of bottom-up and top-down processes in the perception of biological motion. Second, the mirror neuron system does not account for all aspects of action understanding. As Gallese and Sinigaglia (2014, p. 200) pointed out, action understanding “involves representing to which . . . goals the action is directed; identifying which beliefs, desires, and intentions specify reasons explaining why the action happened; and realising how those reasons are linked to the agent and to her action”. Third, nearly all studies on the mirror neuron system have investigated its properties with respect only to hand actions. However, somewhat different mirror neuron networks are probably associated with hand-and-mouth actions (Ferrari et al., 2017a). Fourth, it follows from theoretical approaches to the mirror neuron system that an observer’s ability to understand another person’s actions should be greater if they both execute any given action in a similar fashion. This prediction has been confirmed (Macerollo et al., 2015). Such research indicates the importance of studying individual differences in motor actions, which have so far been relatively neglected.

CHANGE BLINDNESS We have seen that a changing visual environment allows us to move in the appropriate direction and to make coherent sense of our surroundings. However, as we will see, our perceptual system does not always respond appropriately to changes in the visual environment. Have a look around you (go on!). You probably have a strong impression of seeing a vivid and detailed picture of the visual scene. As a result, you are probably confident you could immediately detect any reasonably large change in the visual environment. In fact, that is often not the case.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 163

28/02/20 6:43 PM

164

KEY TERMS Change blindness Failure to detect various changes (e.g. in objects) in the visual environment. Inattentional blindness Failure to detect an unexpected object appearing in the visual environment. Change blindness blindness The tendency of observers to overestimate greatly the extent to which they can detect visual changes and so avoid change blindness.

Visual perception and attention

Change blindness, which is “the failure to detect changes in visual scenes” (Ball et al., 2015, p. 2253) is the main phenomenon we will discuss. We also consider inattentional blindness, “the failure to consciously perceive otherwise salient events when they are not attended” (Ward & Scholl, 2015, p. 722). Research on change blindness focuses on dynamic processes over time. It has produced striking and counterintuitive findings leading to new theoretical thinking about the processes underlying conscious visual awareness. Change blindness and inattentional blindness both depend on a mixture of perceptual and attentional processes. It is thus appropriate to discuss these phenomena at the end of our coverage of perception and immediately prior to the start of our coverage of attention. You have undoubtedly experienced change blindness at the movies caused by unintended continuity mistakes when a scene has been reshot. For example, in the film Skyfall, James Bond is followed by a white car. Mysteriously, this car suddenly becomes black and then returns to being white! For more examples, type “Movie continuity mistakes” into YouTube. We greatly exaggerate our ability to detect visual changes. Levin et  al. (2002) asked observers to watch videos involving two people in a ­restaurant. In one video, the plates change from red to white and in another a  scarf worn by one person disappeared. Levin et al. found 46% of observers claimed they would have noticed the change in the colour of the plates without being forewarned and the figure was 78% for the ­disappearing scarf. In a previous study, 0% of observers detected either change! Levin et  al. introduced the term change blindness blindness to describe our wildly optimistic beliefs about our ability to detect visual changes. In the real world, we are often aware of visual changes because we detect motion signals accompanying the change. Laboratory researchers have used various ways to prevent observers from detecting motion signals. One way is to make the change during a saccade (rapid movement of the eyes). Another way is to have a short gap between the original and changed displays (the flicker paradigm). Suppose you walked across a large square close to a unicycling clown wearing a vivid purple and yellow outfit, large shoes and a bright red nose (see Figure 4.11). Would you spot him? I  imagine your answer is “Yes”. However, Hyman et al. (2009) found only 51% of people walking on their own spotted the clown. Those failing to spot the clown exhibited inattentional blindness.

Figure 4.11 The unicycling clown who cycled close to students walking across a large square. From Hyman et al. (2009).

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 164

Change blindness vs inattentional blindness Change blindness and inattentional blindness both involve a failure to detect some visual event

28/02/20 6:43 PM



Motion perception and action

165

occurring in plain sight. Unsurprisingly, failures of attention often play an important role in causing both forms of blindness. However, there are major differences between the two phenomena (Jensen et al., 2011). First, consider the effects of instructing observers to look for unexpected objects or visual changes. Target detection in change blindness paradigms is often hard even with such instructions. In contrast, target detection in inattentional blindness paradigms becomes trivially easy. Second, change blindness involves the use of memory to compare prechange and post-change stimuli, whereas inattentional blindness does not. Third, inattentional blindness mostly occurs when the observer’s attention is engaged in a demanding task (e.g., chatting on a mobile phone) unlike change blindness. In sum, more complex processing is typically required for successful performance in change blindness tasks. More specifically, observers must engage successfully in five separate processes for change detection to occur (Jensen et al., 2011): (1) Attention must be paid to the change location. (2) The pre-change visual stimulus at the change location must be encoded into memory. (3) The post-change visual stimulus at the change location must be encoded into memory. (4) The pre- and post-change representations must be compared. (5) The discrepancy between the pre- and post-change representations must be recognised at the conscious level.

IN THE REAL WORLD: IT’S MAGIC! Magicians benefit from the phenomena of inattentional blindness and change blindness (Kuhn  & Martinez, 2012). Most magic tricks involve misdirection which is designed “to disguise the method and thus prevent the audience from detecting it” (Kuhn & Martinez, 2012, p. 2). Many people believe misdirection involves the magician manipulating the audience’s attention away from some action crucial to the trick’s success. That is often (but not always) the case.

Inattentional blindness Kuhn and Findlay (2010) studied inattentional blindness using a disappearing lighter (see Figure  4.12 for details). There were three main findings. First, of the observers who detected the drop, 31% were fixating close to the magician’s left hand when the lighter was dropped from that hand. However, 69% were fixating some distance away and so detected the drop in peripheral vision (see Figure 4.13). Second, the average distance between fixation and the drop was the same in those who detected the drop in peripheral vision and those who did not. Third, the time taken after the drop to fixate the left hand was much less in observers using peripheral vision to detect the drop than those failing to detect it (650 ms vs 1,712 ms). What do the above findings mean? The lighter drop can be detected by overt attention (attention directed to the fixation point) or covert attention (attention directed away from the fixation point). Covert attention was surprisingly effective because the human visual system can readily detect movement in peripheral vision (see Chapter 2).

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 165

28/02/20 6:43 PM

166

Visual perception and attention

Figure 4.12 The sequence of events in the disappearing lighter trick: (a) the magician picks up a lighter with his left hand and (b) lights it; (c) and (d) he pretends to take the flame with his right hand and (e) gradually moves it away from the hand holding the lighter; (f) he reveals his right hand is empty while the lighter is dropped into his lap; (g) the magician directs his gaze to his left hand and (h) reveals that his left hand is also empty and the lighter has disappeared. From Kuhn and Findlay (2010). Reprinted with permission of Taylor & Francis.

Most people underestimate the importance of peripheral vision to trick detection. Across several magic tricks (including the lighter trick and other tricks involving change blindness), Ortega et al. (2018) found under 30% of individuals thought they were likely to detect how a trick worked using peripheral vision. In fact, however, over 60% of the tricks where they detected the method involved peripheral vision! Thus, most people exaggerate the role of central vision in understanding magic tricks.

Change blindness

Figure 4.13 Participants’ fixation points at the time of dropping the lighter for those detecting the drop (triangles) and those missing the drop (circles). From Kuhn and Findlay (2010). Reprinted with permission of Taylor & Francis.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 166

Smith et al. (2012) used a magic trick in which a coin was passed from one hand to the other and then dropped. Observers guessed whether the coin landed heads or tails. On one trial, the coin was switched (e.g., from a £1 coin to a 2p coin). All observers fixated the coin throughout the time it was visible but about 90% failed to detect the coin had changed! Thus, an object can be attended to without some of the features irrelevant to the current task being processed sufficiently to prevent change blindness.

28/02/20 6:43 PM



Motion perception and action

167

Kuhn et al. (2016) used a trick in which a magician made the colour of playing cards change. Explicit instructions to observers to keep their eyes on the  cards influenced overt attention but failed to reduce change blindness.

Conclusions The success of many magic tricks depends less on where observers are fixating (overt attention) than we might think. Observers can be deceived even when their overt attention is directed to the crucial location. In addition, they often avoid change blindness or inattentional blindness even when their overt attention is directed some distance away from the crucial location. Such findings are typically explained by assuming the focus of covert attention often differs from that of overt attention. More generally, peripheral vision is often of more importance to the detection of magic tricks than most people believe.

Change blindness underestimates visual processing Ball and Busch (2015) distinguished between two types of change detection: (1) seeing the object that changed; (2) sensing there has been a change without conscious awareness of which object has changed. Several coloured objects were presented in pre- and post-change displays. If the post-change display contained a colour not present in the pre-change display, observers often sensed change had occurred without being aware of what had changed. When observers show change blindness, it does not necessarily mean there was no processing of the change. Ball et al. (2015) used object changes where the two objects were semantically related (e.g., rail car changed to rail) or unrelated (e.g., rail car changed to sausage). Use of event-related potentials (ERPs; see Glossary) revealed a larger negative wave when the objects were semantically unrelated even when observers exhibited change blindness. Thus, there was much unconscious processing of the pre- and post-change objects.

What causes change blindness? There is no single (or simple) answer to the question “What causes change blindness?”. However, two major competing theories both provide partial answers. First, there is the attentional approach (e.g., Rensink et al., 1997). According to this approach, change detection requires selective attention to be focused on the object that changes. Attention is typically directed to only a limited part of visual space, and changes in unattended objects are unlikely to be detected. Second, there is a theoretical approach emphasising the importance of peripheral vision (Rosenholtz, 2017a,b; Sharan et al., 2016 unpublished). It is based on the assumption that visual processing occurs in parallel across the entire visual field (including peripheral vision). According to this approach, “Peripheral vision is a limiting factor underlying standard demonstrations of change blindness” (Sharan et al., 2016, p. 1).

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 167

28/02/20 6:43 PM

168

Visual perception and attention

Attentional approach Change blindness often depends on attentional processes. We typically attend to regions of a visual scene likely to contain salient or important information. Spot the differences between the pictures in Figure 4.14. Observers took an average of 10.4 seconds with the first pair of pictures but only 2.6 with the second pair (Rensink et al., 1997). The height of the railing is less important than the helicopter’s position. Hollingworth and Henderson (2002) recorded eye movements while observers viewed visual scenes (e.g., kitchen; living room). It was assumed the object fixated at any moment was the being attended. There were two potential changes in each scene: ●●

●●

Type change: an object was replaced by one from a different category (e.g., a plate replaced by a bowl). Token change: an object was replaced by an object from the same category (e.g., a plate replaced by a different plate).

Figure 4.14 (a) The object that is changed (the railing) undergoes a shift in location comparable to that of the object that is changed (the helicopter) in (b). However, the change is much easier to see in (b) because the changed object is more important. From Rensink et al. (1997). Copyright 1997 by SAGE. Reprinted by permission of SAGE Publications.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 168

28/02/20 6:43 PM



Motion perception and action

169

What did Hollingworth and Henderson (2002) find? First, there was much greater change detection when the changed object was fixated prior to the change than when it was not fixated (see Figure 4.15a). Second, there was change blindness for 60% of objects fixated prior to changing. Thus, attention to the to-be-changed object was necessary (but not sufficient) for change detection. Third, change detection was much greater when the object type changed rather than simply token change because type changes are more dramatic and obvious.

Evaluation The attentional approach has various successes to its credit. First, change detection is greater when target stimuli are salient or important and so attract attention. Second, change detection is generally greater when the to-be-changed object has been fixated (attended to) prior to the change. What are the limitations with the attentional approach? First, the notion that narrow-focused attention determines our visual experience is

Figure 4.15 (a) Percentage of correct change detection as a function of form of change (type vs token) and time of fixation (before vs after change); also false alarm rate when there was no change. (b) Mean percentage correct change detection as a function of the number of fixations between target fixation and change of target and form of change (type vs token). Both from Hollingworth and Henderson (2002). Copyright 2002 American Psychological Association. Reproduced with permission.

170

KEY TERM Visual crowding The inability to recognise objects in peripheral vision due to the presence of neighbouring objects.

Visual perception and attention

hard to reconcile with our strong belief that experience spans the entire field of view (Cohen et al., 2016). Second, “A selective attention account is hard to prove or disprove, as it relies on largely unknown attentional loci as well as poorly understood effects of attention” (Sharan et al., 2016). Third, change blindness is sometimes poorly predicted by the focus of overt attention (indexed by eye fixations) (e.g., Smith et al., 2012; Kuhn et al., 2016). Such findings are often explained by covert attention, but this is typically not measured directly. Fourth, the attentional approach implies incorrectly that very little useful information is extracted from visual areas outside the focus of attention (see below).

Peripheral vision approach Visual acuity is greatest in the centre of the visual field (the fovea; see Glossary). However, peripheral vision (all vision outside the fovea) typically covers the great majority of the visual field (see Chapter 2). As Rosenholtz (2016, p. 438) pointed out, it is often assumed “Peripheral vision is impoverished and all but useless”. This is a great exaggeration even though acuity and colour perception are much worse in the periphery than the fovea. In fact, peripheral vision is often most impaired by visual crowding: “identification of a peripheral object is impaired by nearby objects” (Pirkner & Kimchi, 2017, p. 1) (see Chapter 5). According to Sharan et al. (2016, p. 3), “The hypothesis that change blindness may arise in part from limitations of peripheral vision is quite different from usual explanations of the phenomenon [which attribute it to] a mix of inattention and lack of details stored in memory.” Sharan et al. (2016) tested the above hypothesis. Initially, they categorised change-detection tasks as easy, medium and hard on the basis of how rapidly observers detected the change. Then they presented these tasks to different observers who fixated at various degrees of visual angle (eccentricities) from the area that changed. There were two key findings: (1) Change-detection performance was surprisingly good even when the change occurred well into peripheral vision. (2) Peripheral vision plays a major role in determining change-detection performance – hard-to-detect changes require closer fixations than those that are easy to detect. + 1 +++ (b) 8 ++++ +++ + Easy ++++++ + + 7 0.9 + + + + Medium ++ ++ + + + 6 Hard + + +++++ + 0.8 + + ++ + + 5 + + ++ + 0.7 4 + + ++ + + 3 + + 0.6 ++ ++ 2 + ++ + 0.5 + + + 1 + 0.4 0 0 5 10 15 20 Eccentricity (deg)

p = 0.019 p = 0.013

Eccentricity (deg)

Figure 4.16 (a) Change-detection accuracy as a function of task difficulty and visual eccentricity. (b) The eccentricity at which change-detection accuracy was 85% correct as a function of task difficulty.

Accuracy

(a)

Easy

Medium

Hard

From Sharan et al. (2016).

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 170

28/02/20 6:43 PM



Motion perception and action

171

Further evidence that much information is extracted from peripheral as well as foveal vision was reported by Clarke and Mack (2015). On each trial, two real-world scenes were presented with an interval of 1,500 ms between them. When this interval was unfilled, only 11% of changes were detected. However, when a cue indicating the location of a possible change was presented 0, 300 or 1,000 ms after the offset of the first scene, change-detection rates were much higher. They were greatest in the 0 ms condition (29%) and lowest in the 1,000 ms condition (18%). Thus, much information about the first scene (including information from peripheral vision) was stored briefly in iconic memory (see Glossary and Chapter 6). If peripheral vision provides observers with general or gist information, they might detect global scene changes without detecting precisely what had changed. Howe and Webb (2014) obtained support for this prediction. Observers were presented with an array of 30 discs (15 red, 15 green). On 24% of trials when three discs changed colour, observers detected the array had changed but could not identify the discs involved.

Evaluation The peripheral vision approach has proved successful in various ways. First, visual information is often extracted from across the entire visual field as predicted by this approach (but not the attentional approach). This supports our strong belief that we perceive most of the immediate visual environment. Second, this approach capitalises on established knowledge concerning peripheral vision. Third, this approach has been applied successfully to explain visual-search performance (see Chapter 5). What are the limitations of this approach? First, it de-emphasises attention’s role in determining change blindness, and does not provide a detailed account of how attentional and perceptual processes are integrated. Second, Sharan et al. (2016) discovered change detection was sometimes difficult even though the change could be perceived easily in peripheral vision. This indicates that other factors (as yet unidentified) are also involved. Third, the approach does not consider failure to compare pre- and post-change representations as a reason for change blindness (see below).

Comparison of pre- and post-change representations Change blindness can occur because observers fail to compare their preand post-change representations. Angelone et al. (2003) presented a video in which the identity of the central actor changed. On a subsequent line-up task to identify the pre-change actor, observers showing change blindness performed comparably to those showing change detection (53% vs 46%, respectively). Varakin et al. (2007) extended the above research in a real-world study in which a coloured binder was switched for one of a different colour while observers’ eyes were closed. Some observers exhibited change blindness even though they remembered the colours of the pre- and post-change binders and so had failed to compare the two colours. Other observers showing change blindness had poor memory for the pre- and post-change colours and so failed to represent these two pieces of information in memory.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 171

28/02/20 6:43 PM

172

KEY TERM Serial dependence Systematic bias of current visual perception towards recent visual input.

Visual perception and attention

Is change blindness a defect? Is change blindness an unfortunate defect? Fischer and Whitney (2014) argued the answer is “No”. The visual world is typically relatively stable over short time periods. As a result, it is worthwhile for us to sacrifice ­perceptual accuracy occasionally to ensure we have a continuous, stable perception of our visual environment. Fischer and Whitney (2014) supported their argument by finding the perceived orientation of a grating was biased in the direction of a previously presented grating, an effect known as serial dependence. Manassi et  al. (2018) found serial dependence for an object’s location – when an object that had been presented previously was re-presented, it was perceived as being closer to its original location than was actually the case. Serial dependence probably involves several stages of visual perception and may also involve memory processes (Bliss et al., 2017). In sum, the visual system’s emphasis on perceptual stability inhibits our ability to detect changes within the visual scene.

Inattentional blindness and its causes The most famous study on inattentional blindness was reported by Simons and Chabris (1999). In one condition, observers watched a video where students dressed in white (the white team) passed a ball to each other and the observers counted the number of passes (see the video at www.simonslab. com/videos.html). At some point, a woman in a black gorilla suit walks into camera shot, looks at the camera, thumps her chest and then walks off (see Figure 4.17). Altogether she is on screen for 9 seconds. Very surprisingly, only 42% of observers noticed the gorilla! This is a striking example of ­inattentional blindness. Why was performance so poor in the above experiment? Simons and Chabris (1999) obtained additional relevant evidence. In a second condition, observers counted the number of passes made by students dressed in black. Here 83% of observers detected the gorilla’s presence. Thus, observers were more likely to attend to the gorilla when it resembled task-relevant stimuli (i.e., in colour). It is generally assumed detection performance is good when observers count black team passes because of selective attention to black objects. Indeed, Rosenholtz et  al. (2016) found that observers counting black team passes had eye fixations closer to the gorilla than those counting white team Figure 4.17 passes. However, Rosenholtz et al. also Frame showing a woman in a gorilla suit in the middle of a game found that observers counting black team of passing the ball. passes (but whose fixation patterns resemFrom Simons & Chabris (1999). Figure provided by Daniel Simons, www. dansimons.com/www.theinvisiblegorilla.com. bled those of observers counting white

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 172

28/02/20 6:43 PM



Motion perception and action

173

team passes) had unusually poor detection performance (54% compared to a typical 80%). Thus, detection performance may depend on the strengths and limitations of peripheral vision as well as failures of selective attention. The presence of inattentional blindness can lead us to underestimate the amount of processing of the undetected stimulus. Schnuerch et al. (2016) found categorising attended stimuli was slower when the meaning of an undetected stimulus conflicted with that of the attended stimulus. Thus, the meaning of undetected stimuli was processed despite inattentional blindness. Other research using event-related potentials (reviewed by Pitts, 2018) has shown that undetected stimuli typically receive moderate processing. How can we explain inattentional blindness? As we have seen, explanations often emphasise the role of selective attention or attentional set. Simons and Chabris’ (1999) findings indicate the importance of similarity in stimulus features (e.g., colour) between task stimuli and the unexpected object. However, Most (2013) argued that similarity in semantic category is also important. Participants tracked numbers or letters. On the critical trial, an unexpected stimulus (the letter E or number 3) was visible for 7  seconds. The letter and number were visually identical except they were mirror images of each other. What did Most (2013) find? There was much less inattentional blindness when the unexpected stimulus belonged to the same category as the tracked objects. Thus, inattentional blindness can depend on attentional sets based on semantic categories (e.g., letters; numbers). Légal et al. (2017) investigated the role of demanding top-down attentional processes in producing inattentional blindness using Simons and Chabris’ (1999) gorilla video. Some observers counted the passes made by the white team (standard task) whereas others had the more attentionally demanding task of counting the number of aerial passes as well as total passes. As predicted, there was much more evidence of inattentional blindness (i.e., failing to detect the gorilla) when the task was more demanding. Légal et al. (2017) reduced inattentional blindness in other conditions by presenting detection-relevant words subliminally (e.g., identify; notice) to observers prior to watching the video. This increased detection rates for the gorilla in the standard task condition from 50% to 83%. Overall, the findings indicate that manipulating attentional processes can have powerful effects on inattentional blindness. Compelling evidence that inattentional blindness depends on top-down processes that strongly influence what we expect to see was reported by Persuh and Melara (2016). Observers fixated a central dot followed by the presentation of two coloured squares and decided whether the colours were the same. On the critical trial, the dot was replaced by Barack Obama’s face (see Figure 4.18). Amazingly, 60% of observers failed to detect this unexpected stimulus presented in foveal vision: Barack Obama blindness. Of these observers, a below-chance 8% identified Barack Obama when deciding whether the unexpected stimulus was Angelina Jolie, a lion’s head, an alarm clock or Barack Obama (see Figure 4.18). Persuh and Melara’s (2016) findings are dramatic because they indicate inattentional blindness can occur even when the novel stimulus is presented on its own with no competing stimuli. These findings suggest there

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 173

28/02/20 6:43 PM

174

Visual perception and attention

Figure 4.18 The sequence of events on the initial baseline trials and the critical trial. From Persuh and Melara (2016).

are important differences in the processes underlying inattentional blindness and change blindness: the latter often depends on visual crowding (see Glossary), which is totally absent in Persuh and Melara’s study.

Evaluation Several factors influencing inattentional blindness have been identified. These factors include the similarity (in terms of stimulus features and semantic category) between task stimuli and the unexpected object; the attentional demands of the task; and observers’ expectations concerning what they will see. If there were no task requiring attentional resources and creating expectations, there would undoubtedly be very little inattentional blindness (Jensen et al., 2011). What are the limitations of research in this area? First, it is typically unclear whether inattentional blindness is due to perceptual failure or to memory failure (i.e., the unexpected object is perceived but rapidly forgotten). However, Ward and Scholl (2015) found that observers showed inattentional blindness even when observers were instructed to report immediately seeing anything unexpected. This finding strongly suggests that inattentional blindness reflects deficient perception rather than memory failure. Second, observers typically engage in some processing of undetected stimuli even when they fail to report the presence of such stimuli (Pitts, 2018). More research is required to clarify the extent of non-conscious processing of undetected stimuli. Third, it is likely that the various factors influencing inattentional blindness interact in complex ways. However, most research has considered only a single factor and so the nature of such interactions has not been established.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 174

28/02/20 6:43 PM



Motion perception and action

175

CHAPTER SUMMARY •

Introduction. The time dimension is very important in visual perception. The changes in visual perception produced as we move around the environment and/or environmental objects move promote accurate perception and facilitate appropriate actions.



Direct perception. Gibson argued perception and action are closely intertwined and so research should not focus exclusively on static observers perceiving static visual displays. According to his direct theory, an observer’s movement creates optic flow providing useful information about the direction of heading. Invariants, which are unchanged as people move around their environment, have particular importance. Gibson claimed the uses of objects (their affordances) are perceived directly. He underestimated the complexity of visual processing, minimising the role of object knowledge in visual perception, with the effects of motion on perception being more complex than he realised.



Visually guided movement. The perception of heading depends in part on optic-flow information. However, there are complexities because the retinal flow field is determined by eye and head movements as well as by optic flow. Heading judgements are also influenced by binocular disparity and the retinal displacement of objects as we approach them. Accurate steering on curved paths (e.g., driving around a bend) sometimes involves focusing on the tangent point (e.g., point on the inside edge of the road at which its direction seems to reverse). However, drivers sometimes fixate a point along the future path. More generally, drivers’ gaze patterns are flexibly determined by control mechanisms that are responsive to their goals. Calculating time to contact with an object often involves calculating tau (the size of the retinal image divided by the object’s rate of expansion). Drivers often use tau-dot (rate of decline of tau over time) to decide whether there is sufficient braking time to stop before contact. Observers often make use of additional sources of information (e.g., binocular disparity; familiar size; relative size) when working out time to contact. Drivers’ braking decisions also depend on their preferred margin of safety and the effectiveness of the car’s braking system.



Visually guided action: contemporary approaches. The planningcontrol model distinguishes between a slow planning system used mostly before the initiation of movement and a fast control system used during movement execution. As predicted, separate brain areas are involved in planning and control. However, the definition of “planning” is very broad, and the notion that planning always precedes control is oversimplified. Recent evidence indicates

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 175

28/02/20 6:44 PM

176

Visual perception and attention

that visually guided action depends on three processing streams (dorso-dorsal; the ventro-dorsal; and ventral, which is discussed more fully in Chapter 2) each making a separate contribution. This theoretical approach is supported by studies on brain-damaged patients and by neuroimaging research. •

Perception of human motion. Human motion is perceived even when only impoverished visual information is available. Perception of human and biological motion involves bottom-up and top-down processes with the latter most likely to be used with degraded visual input. The perception of human motion is special because we can produce as well as perceive human actions and because we devote considerable time to making sense of it. It has often been assumed that our ability to imitate and understand human motion depends on a mirror neuron system (an extensive brain network). This system’s causal involvement in action perception and understanding has been shown in research on brain-damaged patients and studies using techniques to alter its neural activity. The mirror neuron system is especially important in the understanding of relatively simple actions. However, additional high-level cognitive processes are often required if action understanding is complex or involves generalising from past experience.



Change blindness. There is convincing evidence for change blindness and inattentional blindness. Change blindness depends on attentional processes: it occurs more often when the changed object does not receive attention. However, change blindness can occur for objects that are fixated and it also depends on the limitations of peripheral vision. The visual system’s emphasis on continuous, stable perception probably plays a part in making us susceptible to change blindness. Inattentional blindness depends very strongly on top-down processes (e.g., selective attention) and can be found even when only the novel stimulus is present in the visual field.

FURTHER READING Binder, E., Dovern, A., Hesse, M.D., Ebke, M., Karbe, H., Salinger, J. et al. (2017). Lesion evidence for a human mirror neuron system. Cortex, 90, 125–137. Ellen Binder and colleagues discuss the nature of the mirror neuron system based on evidence from brain-damaged patients. Keysers, C., Paracampo, R. & Gazzola, V. (2018). What neuromodulation and lesion studies tell us about the function of the mirror neuron system and embodied cognition. Current Opinion in Psychology, 24, 35–40. This article provides a succinct account of our current understanding of the mirror neuron system. Lappi, O. & Mole, C. (2018). Visuo-motor control, eye movements, and steering: A unified approach for incorporating feedback, feedforward, and internal models. Psychological Bulletin, 144, 981–1001. Otto Lappi and Callum Mole provide

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 176

28/02/20 6:44 PM



Motion perception and action

177

a comprehensive theoretical account of driving behaviour that emphasises the importance of top-down control mechanisms in influencing drivers’ eye fixations. Osiurak, F., Rossetti, Y. & Badets, A. (2017). What is an affordance? 40 years later. Neuroscience and Biobehavioral Reviews, 77, 403–417. François Osiurak and colleagues discuss Gibson’s notion of affordances in the contest of contemporary research and theory. Rosenholtz, R. (2017a). What modern vision science reveals about the awareness puzzle: Summary-statistic encoding plus decision limits underlie the richness of visual perception and its quirky failures. Vision Sciences Society Symposium on Summary Statistics and Awareness, preprint arXiv:1706.02764. Ruth Rosenholtz provides an excellent account of the role played by peripheral vision in change blindness and other phenomena. Sakreida, K., Effnert, I., Thill, S., Menz, M.M., Jirak, D., Eickhoff, C.R. et al. (2016). Affordance processing in segregated parieto-frontal dorsal stream sub-pathways. Neuroscience and Biobehavioral Reviews, 69, 80–112. The pathways within the brain involved in goal-directed interactions with objects are discussed in the context of a meta-analytic review.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 177

28/02/20 6:44 PM

Chapter

5

Attention and performance

INTRODUCTION Attention is invaluable in everyday life. We use attention to avoid being hit by cars when crossing the road, to search for missing objects and to perform two tasks together. The word “attention” has various meanings but typically refers to selectivity of processing as emphasised by William James (1890, pp. 403–404): Attention is . . . the taking into possession of the mind, in clear and vivid form, of one out of what seem several simultaneously possible objects or trains of thought. Focalisation, concentration, of consciousness are of its essence.

KEY TERMS Focused attention A situation in which individuals try to attend to only one source of information while ignoring other stimuli; also known as selective attention. Divided attention A situation in which two tasks are performed at the same time; also known as multi-tasking.

William James distinguished between “active” and “passive” modes of attention. Attention is active when controlled in a top-down way by the individual’s goals or expectations. In contrast, attention is passive when controlled in a bottom-up way by external stimuli (e.g., a loud noise). This distinction remains theoretically important (e.g., Corbetta & Shulman, 2002; see discussion, pp. 192–196). Another important distinction is between focused and divided attention. Focused attention (or selective attention) is studied by presenting individuals with two or more stimulus inputs at the same time and instructing them to respond to only one. Research on focused or selective attention tells us how effectively we can select certain inputs and avoid being distracted by non-task inputs. It also allows us to study the selection process and the fate of unattended stimuli. Divided attention is also studied by presenting at least two stimulus inputs at the same time. However, individuals are instructed they must attend (and respond) to all stimulus inputs. Divided attention is also known as multi-tasking (see Glossary). Studies of divided attention provide useful information about our processing limitations and the capacity of our attentional mechanisms.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 178

28/02/20 6:44 PM



179

Attention and performance

There is a final important distinction (the last one, I promise you!) between external and internal attention. External attention is “the selection and modulation of sensory information” (Chun et al., 2011). In contrast, internal attention is “the selection, modulation, and maintenance of internally generated information, such as task rules, responses, long-term memory, or working memory” (Chun et al., 2011, p. 73). The connection to Baddeley’s working memory model is especially important (e.g., Baddeley (2012); see Chapter 6). The central executive component of working memory is involved in attentional control and is crucially involved in internal and external attention. Much attentional research has two limitations. First, the emphasis is on attention to externally presented task stimuli rather than internally generated stimuli (e.g., worries; self-reflection). One reason is that it is easier to assess and to control external attention. Second, what participants attend to is determined by the experimenter’s instructions. In contrast, what we attend to in the real world is mostly determined by our current goals and emotional states. Two important topics related to attention are discussed elsewhere. Change blindness (see Glossary), which shows the close links between attention and perception, is considered in Chapter 4. Consciousness (including its relationship to attention) is discussed in Chapter 16.

KEY TERMS Cocktail party problem The difficulties involved in attending to one voice when two or more people are speaking at the same time. Dichotic listening task A different auditory message is presented to each ear and attention has to be directed to one message. Shadowing Repeating one auditory message word for word as it is presented while a second auditory message is also presented; it is used on the dichotic listening task.

FOCUSED AUDITORY ATTENTION Many years ago, British scientist Colin Cherry (1953) became fascinated by the cocktail party problem – how can we follow just one conversation when several people are talking at once? As we will see, there is no simple answer. McDermott (2009) identified two problems listeners face when attending to one voice among many. First, there is sound segregation: the listener must decide which sounds belong together. This is complex: machine-based speech recognition programs often perform poorly when attempting to achieve sound segregation with several sound sources present together (Shen et al., 2008). Second, after segregation has been achieved, the listener must direct attention to the sound source of interest and ignore the others. McDermott (2009) pointed out that auditory segmentation is often harder than visual segmentation (deciding which visual features belong to which objects; see Chapter 3). There is considerable overlap of signals from different sound sources in the cochlea whereas visual objects typically occupy different retinal regions. There is another important issue – when listeners attend to one auditory input, how much processing is there of the unattended input(s)? As we will see, various answers have been proposed. Cherry (1953) addressed the issues discussed so far (see Eysenck, 2015, for an evaluation of his research). He studied the cocktail party problem using a dichotic listening task in which a different auditory message was presented to each ear and the listener attended to only one. Listeners engaged in shadowing (repeating the attended message aloud as it was presented) to ensure their attention was directed to that message. However, the shadowing task has two potential disadvantages: (1) listeners do not

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 179

28/02/20 6:44 PM

180

Visual perception and attention

normally engage in shadowing and so the task is artificial; and (2) it increases listeners’ processing demands. Listeners solved the cocktail party problem by using differences between the auditory inputs in physical features (e.g., sex of speaker; voice intensity; speaker location). When these physical differences were eliminated by presenting two messages in the same voice to both ears at once, listeners found it very hard to separate out the messages based on differences in meaning. Cherry (1953) found very little information seemed to be extracted from the unattended message. Listeners seldom noticed when it was spoken backwards or in a foreign language. However, physical changes (e.g., a pure tone) were nearly always detected. The conclusion that unattended information receives minimal processing was supported by Moray (1959), who found listeners remembered very few words presented 35 times each.

Where is the bottleneck? Early vs late selection

Interactive exercise: Treisman

Many psychologists have argued we have a processing bottleneck (discussed below). A bottleneck in the road (e.g., where it is especially narrow) can cause traffic congestion, and a bottleneck in the processing system seriously limits our ability to process two (or more) simultaneous inputs. However, it would sometimes solve the cocktail problem by permitting listeners to process only the desired voice. Where is the bottleneck? Broadbent (1958) argued a filter (bottleneck) early in processing allows information from one input or message through it based on the message’s physical characteristics. The other input remains briefly in a sensory buffer and is rejected unless attended to rapidly (see Figure 5.1). Thus, Broadbent argued there is early selection. Treisman (1964) argued the bottleneck’s location is more flexible than Broadbent suggested (see Figure 5.1). She claimed listeners start with processing based on physical cues, syllable pattern and specific words and then process grammatical structure and meaning. Later processes are omitted or attenuated if there is insufficient processing capacity to permit full stimulus analysis. Treisman (1964) also argued top-down processes (e.g., expectations) are important. Listeners performing the shadowing task sometimes say a word from the unattended input. Such breakthroughs mostly occur when the word on the unattended channel is highly probable in the context of the attended message. Deutsch and Deutsch (1963) argued all stimuli are fully analysed, with the most important or relevant stimulus determining the response. Thus, they placed the bottleneck much later in processing than did Broadbent (see Figure 5.1).

Findings: unattended input Broadbent’s approach predicts little or no processing of unattended auditory messages. In contrast, Treisman’s approach suggests flexibility in the processing of unattended messages, whereas Deutsch and Deutsch’s approach implies reasonably thorough processing of such messages. Relevant findings are discussed below.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 180

28/02/20 6:44 PM



Attention and performance

Treisman and Riley (1969) asked listeners to shadow one of two auditory messages. They stopped shadowing and tapped when they detected a target in either message. Many more target words were detected on the shadowed message. Aydelott et al. (2015) asked listeners to perform a task on attended target words. When unattended words related in meaning were presented shortly before the target words themselves, performance on the target words was enhanced when unattended words were presented as loudly as attended ones. Thus, the meaning of unattended words was processed. There is often more processing of unattended words that have a special significance for the listener. For example, Li et al. (2011) obtained evidence that unattended weight-related words (e.g., fat; chunky) were processed more thoroughly by women dissatisfied with their weight. Conway et al. (2001) found listeners often detected their own name on the unattended message. This was especially the case if they had low working memory capacity (see Glossary) indicative of poor attentional control. Coch et al. (2005) asked listeners to attend to one of two auditory inputs and to detect targets presented on either input. Event-related potentials (ERPs; see Glossary) provided a measure of processing activity. ERPs 100 ms after target presentation were greater when the target was presented on the attended rather than the unattended message. This suggests there was more processing of the attended than unattended targets. Greater brain activation for attended than unattended auditory stimuli may reflect enhanced processing for attended stimuli and/or suppressed processing for unattended stimuli. Horton et al. (2013) addressed this

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 181

181

Figure 5.1 A comparison of Broadbent’s theory (top), Treisman’s theory (middle), and Deutsch and Deutsch’s theory (bottom).

28/02/20 6:44 PM

182

Visual perception and attention

issue. Listeners heard separate speech messages presented to each ear with instructions to attend to the left or right ear. There was greater brain activation associated with the attended message (especially around 90 ms after stimulus presentation). This difference depended on enhancement of the attended message combined with suppression of the unattended message. Classic theories of selective auditory attention (those of Broadbent, Treisman, and Deutsch and Deutsch) de-emphasised the importance of suppression or inhibition of the unattended message shown by Horton et  al. (2013). For example, Schwartz and David (2018) reported suppression of neuronal responses in the primary auditory cortex to distractor sounds. More generally, all the classic theories de-emphasise the flexibility of selective auditory attention and the role of top-down processes in selection (see below).

Findings: cocktail party problem Humans are generally very good at separating out and understanding one voice from several speaking at the same time (i.e., solving the cocktail party problem). The extent of this achievement is indicated by the finding that automatic speech recognition systems are considerably inferior to human speech recognition (Spille & Meyer, 2014). Mesgarani and Chang (2012) studied listeners with implanted multi-­ electrode arrays permitting the direct recording of activity within the auditory cortex. They heard two different messages (one in a male voice; one in a female voice) presented to the same ear with instructions to attend to only one. The responses within the auditory cortex revealed “The salient spectral [based on sound frequencies] and temporal features of the attended speaker, as if subjects were listening to that speaker alone” (Mesgarani & Chang, 2012, p. 233). Listeners found it easy to distinguish between the two messages in the study by Mesgarani and Chang (2012) because they differed in physical characteristics (i.e., male vs female voice). Olguin et al. (2018) presented native English speakers with two messages in different female voices. The attended message was always in English whereas the unattended message was in English or an unknown language. Comprehension of the attended message was comparable in both conditions. However, there was stronger neural encoding of both messages in the former condition. As Olguin et al. concluded, “The results offer strong support to flexible accounts of selective [auditory] attention” (p. 1618). In everyday life, we are often confronted by several different speech streams. Accordingly, Puvvada and Simon (2017) presented three speech streams and assessed brain activity as listeners attended to only one. Early in processing, “the auditory cortex maintains an acoustic representation of the auditory scene with no significant preference to attended over ignored sources” (p. 9195). Later in processing, “Higher-order auditory cortical areas represent an attended speech stream separately from, and with significantly higher fidelity [accuracy] than, unattended speech streams” (p. 9189). This latter finding results from top-down processes (e.g., attention). How do we solve the cocktail party problem? The importance of top-down processes is suggested by the existence of extensive descending

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 182

28/02/20 6:44 PM



Attention and performance

183

pathways from the auditory cortex to brain areas involved in early auditory processing (Robinson & McAlpine, 2009). Various top-down factors based on listeners’ knowledge and/or expectations are involved. For example, listeners are more accurate at identifying what one speaker is saying in the context of several other voices if they have previously heard that speaker’s voice in isolation (McDermott, 2009). Woods and McDermott (2018) investigated top-down processes in selective auditory attention in more detail. They argued, “Sounds produced by a given source often exhibit consistencies in structure that might be useful in separating sources” (p. E3313). They used the term “schemas” to refer to such structural consistencies. Listeners showed clear evidence of schema learning leading to rapid improvements in their listening performance. An important aspect of such learning is temporal coherence – a given source’s sound features are typically all present when it is active and absent when it is silent. Shamma et al. (2011) discussed research showing that if listeners can identify one distinctive feature of the target voice, they can then distinguish its other sound features via temporal coherence. Evans et al. (2016) compared patterns of brain activity when attended speech was presented on its own or together with competing unattended speech. Brain areas associated with attentional and control processes (e.g., frontal and parietal regions) were more activated in the latter condition. Thus, top-down processes relating to attention and control are important in selective auditory processing. Finally, Golumbic et al. (2013) suggested individuals at actual cocktail parties can potentially use visual information to assist them in understanding what a given speaker is saying. Listeners heard two simultaneous messages (one in a male voice and the other in a female voice). Processing of the attended message was enhanced when they saw a video of the speaker talking. In sum, listeners generally achieve the complex task of selecting one speech message from among several such messages. There has been progress in identifying the top-down processes involved. For example, if listeners can identify at least one consistently distinctive feature of the target voice, this makes it easier for them to attend only to that voice. Top-down processes often produce a “winner-takes-all” situation where the processing of one auditory input (the winner) suppresses the brain activity ­associated with all other inputs (Kurt et al., 2008).

FOCUSED VISUAL ATTENTION There has been much more research on visual attention than auditory attention. The main reason is that vision is our most important sense modality with more of the cortex devoted to it than any other sense. Here we consider four key issues. First, what is focused visual attention like? Second, what is selected in focused visual attention? Third, what happens to unattended visual stimuli? Fourth, what are the major systems involved in visual attention? In the next section (see pp. 196–200), we discuss what the study of visual disorders has taught us about visual attention.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 183

28/02/20 6:44 PM

184

KEY TERM Split attention Allocation of attention to two (or more) nonadjacent regions of visual space.

Visual perception and attention

Spotlight, zoom lens or multiple spotlights? Look around you and attend to any interesting objects. Was your visual attention like a spotlight? A spotlight illuminates a fairly small area, little can be seen outside its beam and it can be redirected to focus on any given object. Posner (1980) argued the same is true of visual attention. Other psychologists (e.g., Eriksen & St. James, 1986) claim visual attention is more flexible than suggested by the spotlight analogy and argue visual attention resembles a zoom lens. We can increase or decrease the area of focal attention just as a zoom lens can be adjusted to alter the visual area it covers. This makes sense. For example, car drivers often need to narrow their attention after spotting a potential hazard. A third theoretical approach is even more flexible. According to the multiple spotlights theory (Awh & Pashler, 2000), we sometimes exhibit split attention (attention directed to two or more non-adjacent regions in space). The notion of split attention is controversial. Jans et al. (2010) argued attention is often strongly linked to motor action and so attending to two separate objects might disrupt effective action. However, there is no strong evidence for such disruption.

Findings Support for the zoom-lens model was reported by Müller et al. (2003). On each trial, observers saw four squares in a semi-circle and were cued to attend to one, two or all four. Four objects were then presented (one in each square) and observers decided whether a target (e.g., a white circle) was among them. Brain activation in early visual areas was most widespread when the attended region was large (i.e., attend to all four squares) and was most limited when it was small (i.e., attend to one square). As predicted by the zoom-lens theory, performance (reaction times and errors) was best with the smallest attended region and worst with the largest one. Chen and Cave (2016, p. 1822) argued the optimal attentional zoom setting “includes all possible target locations and excludes possible distractor locations”. Most findings indicated people’s attentional zoom setting is close to optimal. However, Collegio et al. (2019) obtained contrary findings. Drawings of large objects (e.g., jukebox) and small objects (e.g., watch) were presented so their retinal size was the same. The observer’s area of focal attention was greater with large objects because they made top-down inferences concerning their real-world sizes. As a result, the area of focal attention was larger than optimal for large objects. Goodhew et al. (2016) pointed out that nearly all research has focused only on spatial perception (e.g., identification of a specific object). They focused on temporal perception (was a disc presented continuously or were there two presentations separated by a brief interval?). Spotlight size had no effect on temporal acuity, which is inconsistent with the theory. How can we explain these findings? Spatial resolution is poor in peripheral vision but temporal resolution is good. As a consequence, a small attentional spotlight is more beneficial for spatial than temporal acuity. We turn now to split attention. Suppose you had to identify two digits that would probably be presented to two cued locations a little way apart

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 184

28/02/20 6:44 PM



Attention and performance

185 Figure 5.2 (a) Shaded areas indicate the cued locations; the near and far locations are not cued. (b) Probability of target detection at valid (left or right) and invalid (near or far) locations. Based on information in Awh and Pashler (2000).

(see Figure 5.2a). Suppose also that on some trials a digit was presented between the two cued locations. According to zoom-lens theory, the area of maximal attention should include the two cued locations and the space in between. As a result, the detection of digits presented in the middle should have been very good. In fact, Awh and Pashler (2000) found it was poor (see Figure 5.2b). Thus, attention can resemble multiple spotlights, as predicted by the split-attention approach. Morawetz et al. (2007) presented letters and digits at five locations simultaneously (one in each quadrant of the visual field and one in the centre). In one condition, observers attended to the visual stimuli at the upper left and bottom right locations and ignored the other stimuli. There were two peaks of brain activation corresponding to the attended areas but less activation corresponding to the region in between. Overall, the pattern of activation strongly suggested split attention. Niebergall et al. (2011) recorded the neuronal responses of monkeys attending to two moving stimuli while ignoring a distractor. In the key condition, there was a distractor between (and close to) the two attended stimuli. In this condition, neuronal responses to the distractor decreased compared to other conditions. Thus, split attention involves a mechanism reducing attention to (and processing of) distractors located between attended stimuli.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 185

28/02/20 6:44 PM

186

KEY TERM Hemifield One half of the visual field. Information from the left hemifield of each eye proceeds to the right hemisphere and information from the right hemifield proceeds to the left hemisphere.

Visual perception and attention

In most research demonstrating split attention, the two non-adjacent stimuli being attended simultaneously were each presented to a different hemifield (one half of the visual field). Note that the right hemisphere receives visual signals from the left hemifield and the left hemisphere receives signals from the right hemifield. Walter et al. (2016) found performance was better when non-adjacent stimuli were presented to different hemifields rather than the same hemifield. Of most importance, the assessment of brain activity indicated effective filtering or inhibition of stimuli presented between the two attended stimuli only when presented to different hemifields. In sum, we can use visual attention very flexibly. Visual selective attention can resemble a spotlight, a zoom lens or multiple spotlights, depending on the current situation and the observer’s goals. However, split attention may require that two stimuli are presented to different hemifields rather than the same one. A limitation with all these theories is that metaphors (e.g., attention is a zoom lens) are used to describe experimental findings but these metaphors fail to specify the underlying mechanisms (Di Lollo, 2018).

What is selected? Why might selective attention resemble a spotlight or zoom lens? Perhaps we selectively attend to an area or region of space: space-based attention. Alternatively, we may attend to a given object or objects: object-based attention. Object-based attention is prevalent in everyday life because visual attention is mainly concerned with objects of interest to us (see Chapters 2 and 3). As expected, observers’ eye movements as they view natural scenes are directed almost exclusively to objects (Henderson & Hollingworth, 1999). However, even though we typically focus on objects of potential importance, our attentional system is so flexible we can attend to an area of space or a given object. There is also feature-based attention. For example, suppose you are looking for a friend in a crowd. Since she nearly always wears red clothes, you might attend to the feature of colour rather than specific objects or locations. Leonard et al. (2015) asked observers to identify a red letter within a series of rapidly presented letters. Performance was impaired when a # symbol also coloured red was presented very shortly before the target. Thus, there was evidence for feature-based attention (e.g., colour; motion).

Findings Visual attention is often object-based. For example, O’Craven et al. (1999) presented observers with two stimuli (a face and a house), transparently overlapping at the same location, with instructions to attend to one of them. Brain areas associated with face processing were more activated when the face was attended to than when the house was. Similarly, brain areas associated with house processing were activated when the house was the focus of attention. Egly et al. (1994) devised a much-used method for comparing objectbased and space-based attention (see Figure 5.3). The task was to select a target stimulus as rapidly as possible. A cue presented before the target

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 186

28/02/20 6:44 PM



Attention and performance

187

was valid (same location as the target) or invalid (different location from the target). Of key importance, invalid cues were in the same object as the target (within-object cues) or in a different object (between-object cues). The key finding was that target detection was faster on invalid trials when the cue was in the same object rather than a different one. Thus, attention was at least partly object-based. Does object-based attention in the Egly et al. (1994) task occur fairly “automatically” or does it involve strategic processes? Object- Figure 5.3 based attention should always be found if it is Stimuli adapted from Egly et al. (1994). Participants saw two rectangles and a cue indicated the most likely location of a automatic. Drummond and Shomstein (2010) subsequent target. The target appeared at the cued location found no evidence for object-based attention (V), at the uncued end of the cued rectangle (IS) or at the when the cue indicated with 100% certainty uncued, equidistant end of the uncued rectangle (ID). where the target would appear. Thus, any From Chen (2012). © Psychonomic Society, Inc. Reprinted with preference for object-based attention can be permission from Springer. overridden when appropriate. Hollingworth et al. (2012) found evidence object-based and space-based attention can occur at the same time using a task resembling that of Egly et al. (1994). There were three types of within-object cues varying in the distance between the cue and subsequent target (see Figure 5.4). There was evidence for object-based attention: when the target was far from the cue, performance was worse when the cue was in a different object rather than the same one. There was also evidence for space-based attention: when the target was in the same object as the cue, performance declined the greater the distance between target and cue. Thus, object-based and space-based attention are not mutually exclusive. Similar findings were reported by Kimchi et al. (2016). Observers responded faster to a target presented within rather than outside an object. This indicates object-based attention. There was also evidence for spacebased attention: when targets were presented outside the object, observers responded faster when they were close to it. Kimchi et al. concluded that “object-related and space-related attentional processing can operate ­simultaneously” (p. 48). Pilz et al. (2012) compared object-based and space-based attention using various tasks. Overall, there was much more evidence of space-based than object-based attention, with only a small fraction of participants showing clear-cut evidence of object-based attention. Donovan et al. (2017) noted that most studies indicating visual attention is object-based have used spatial cues, which may bias the allocation of attention. Donovan et al. avoided the use of spatial cues and found “Object-based representations do not guide attentional selection in the absence of spatial cues” (p. 762). This finding suggests previous research KEY TERM has exaggerated the extent of object-based visual attention. Inhibition of return When we search the visual environment, it would be inefficient if we A reduced probability of repeatedly attended to any given location. In fact, we exhibit inhibition of visual attention returning to a recently attended return (a reduced probability of returning to a region recently the focus of location or object. attention). Of theoretical importance is whether inhibition of return applies

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 187

28/02/20 6:44 PM

188

Visual perception and attention

Figure 5.4 (a) Possible target locations (same object far, same object near, valid, different object far) for a given cue. (b) Performance accuracy at the various target locations. From Hollingworth et al. (2012). © 2011 American Psychological Association.

more to locations or objects. The evidence is mixed (see Chen, 2012). List and Robertson (2007) used Egly et al.’s (1994) task shown in Figure 5.4 and found location- or space-based inhibition of return was much stronger than object-based inhibition of return. Theeuwes et al. (2014) found location- and object-based inhibition of return were both present at the same time. According to Theeuwes et al. (p. 2254), “If you direct your attention to a location in space, you will automatically direct attention to any object . . . present at that location, and vice versa.” There is considerable evidence of feature-based attention (see Bartsch et al., 2018, for a review). In their own research, Bartsch et al. addressed

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 188

28/02/20 6:44 PM



Attention and performance

189

the issue of whether feature-based attention to colour-defined targets is confined to the spatially attended region or whether it occurs across the entire visual field. They discovered the latter was the case. Finally, Chen and Zelinsky (2019) argued it is important to study the allocation of attention under more naturalistic conditions than those typically used in research. In their study, observers engaged in free (unconstrained) viewing of natural scenes. Eye-fixation data suggested that attention initially selects regions of space. These regions may provide “the perceptual fragments from which objects are built” (p. 148).

Evaluation Research on whether visual attention is object- or location-based have produced variable findings and so few definitive conclusions are possible. However, the relative importance of object-based and space- or ­location-based attention is flexible. For example, individual differences are important (Pilz et al., 2012). Note that visual attention can be both objectbased and space-based at the same time. What are the limitations of research in this area? First, most research apparently demonstrating that object-based attention is more important than space- or location-based attention has involved the use of spatial cues. Recent evidence (Donovan et al., 2017) suggests such cues may bias visual attention and that visual attention is not initially object-based in their absence. Second, space-, object- and feature-based forms of attention often interact with each other to enhance object processing (Kravitz & Behrmann, 2011). However, we have as yet limited theoretical understanding of the mechanisms involved in such interactions. Third, there is a need for more research assessing patterns of attention under naturalistic conditions. In recent research where observers view artificial stimuli while performing a specific task, it is unclear whether attentional processes resemble those when they engage in free viewing of natural scenes.

What happens to unattended or distracting stimuli? Unsurprisingly, unattended visual stimuli receive less processing than attended ones. Martinez et al. (1999) compared event-related potentials (ERPs) to attended and unattended visual stimuli. The ERPs to unattended visual stimuli were comparable to those to attended ones 50–55 ms after stimulus onset. After that, however, the ERPs to attended stimuli were greater than those to unattended stimuli. Thus, selective attention influences all but the very early stages of processing. As we have all discovered to our cost, it is often hard (or impossible) to ignore task-irrelevant stimuli. Below we consider factors determining whether task performance is adversely affected by distracting stimuli.

Load theory Lavie’s (2005, 2010) load theory has been an influential approach to understanding distraction effects. It distinguishes between perceptual

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 189

28/02/20 6:44 PM

190

Visual perception and attention

and cognitive load. Perceptual load refers to the perceptual demands of a current task. Cognitive load refers to the burden placed on the cognitive system by a current task (e.g., demands on working memory). Tasks involving high perceptual load require nearly all our perceptual capacity whereas low-load tasks do not. With low-load tasks there are spare attentional resources, and so task-irrelevant stimuli are more likely to be processed. In contrast, tasks involving high cognitive load reduce our ability to use cognitive control to discriminate between target and distractor stimuli. Thus, high perceptual load is associated with low distractibility, whereas high cognitive load is associated with high distractibility.

Findings There is much support for the hypothesis that high perceptual load reduces distraction effects. Forster and Lavie (2008) presented six letters in a circle and participants decided which target letter (X or N) was present. The five non-target letters resembled the target letter more closely in the high-load condition. On some trials a picture of a cartoon character (e.g., Spongebob Squarepants) was presented as a distractor outside the circle. Distractors interfered with task performance only under low-load conditions. According to the theory, brain activation associated with distractors should be less when individuals are performing a task involving high perceptual load. This finding has been obtained with visual tasks and distractors (e.g., Schwartz et al., 2005) and also with auditory tasks and distractors (e.g., Sabri et al., 2013). Why is low perceptual load associated with high distractibility? Biggs and Gibson (2018) argued this happens because observers generally adopt a broad attentional focus when perceptual load is low. They tested this hypothesis using three low-load conditions in which participants decided whether a target X or N was presented and a distractor letter was sometimes presented (see Figure 5.5). They argued that observers would adopt the smallest attentional focus in the circle condition and the largest attentional focus in the solo condition. As predicted, distractor interference was greatest in the solo condition and least in the circle condition. Thus, distraction effects depend strongly on size of attentional focus as well as perceptual load. The hypothesis that distraction effects should be greater when cognitive or working memory load is high rather than low was tested by Burnham et al. (2014). As predicted, distraction effects on a visual search task were

Figure 5.5 Sample displays for three low perceptual load conditions in which the task required deciding whether a target X or N was presented. See text for further details.

Standard condition

Solo condition

X

X

* *

*

* *

T

Circle condition

* T

*

X

*

* *

T

From Biggs and Gibson (2018).

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 190

28/02/20 6:44 PM



Attention and performance

191

greater when participants performed another task placing high demands on the cognitive system. Sörqvist et al. (2016) argued high cognitive load can reduce rather than increase distraction. They pointed out that cognitive load is typically associated with high levels of concentration and our everyday experience indicates high concentration generally reduces distractibility. As predicted, they found neural activation associated with auditory distractors was reduced when cognitive load on a visual task was high rather than low. The effects of cognitive load on distraction are very variable. How can we explain this variability? Sörqvist et al. (2016) argued that an important factor is how easily distracting stimuli can be distinguished from task stimuli. When it is easy (e.g., task and distracting stimuli are in different modalities as in the Sörqvist et al., 2016, study), high cognitive load reduces distraction. In contrast, when it is hard to distinguish between task and distracting stimuli (e.g., they are similar and/or in the same modality), then high cognitive load increases distraction. Load theory assumes the effects of perceptual and cognitive load are independent. However, Linnell and Caparos (2011) found perceptual and cognitive processes interacted: perceptual load only influenced attention as predicted when cognitive load was low. Thus, the effects of perceptual load are not “automatic” as assumed theoretically but instead depend on cognitive resources being available.

Evaluation The distinction between perceptual and cognitive load has proved useful in predicting when distraction effects will be small or large. More specifically, the prediction that high perceptual load is associated with reduced distraction effects has received much empirical support. In applied research, load theory successfully predicts several aspects of drivers’ attention and behaviour (Murphy & Greene, 2017). For example, drivers exposed to high perceptual load responded more slowly to hazards and drove less safely. What are the theory’s limitations? First, the terms “perceptual load” and “cognitive load” are vague, making it hard to test the theory (Murphy et al., 2016). Second, the assumption that perceptual and cognitive load have separate effects on attention is incorrect (Linnell & Caparos, 2011). Third, perceptual load and attentional breadth are often confounded. Fourth, the prediction that high cognitive load is associated with high distractibility has been disproved when task and distracting stimuli are easily distinguishable. Fifth, the theory de-emphasises several relevant factors including the salience or conspicuousness of distracting stimuli and the  spatial distance between distracting and task stimuli (Murphy et al., 2016).

Major attention networks As we saw in Chapter 1, many cognitive processes are associated with networks spread across relatively large areas of cortex rather than small, specific regions. With respect to attention, several theorists (e.g., Posner, 1980; Corbetta & Shulman, 2002) have argued there are two major networks.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 191

28/02/20 6:44 PM

192

KEY TERM Covert attention Attention to an object in the absence of an eye movement towards it.

Visual perception and attention

One attention network is goal-directed or endogenous whereas the other is stimulus-driven or exogenous.

Posner’s (1980) approach Posner (1980) studied covert attention in which attention shifts to a given spatial location without an accompanying eye movement. In his research, people responded rapidly to a light. The light was preceded by a central cue (arrow pointing to the left or right) or a peripheral cue (brief illumination of a box outline). Most cues were valid (i.e., indicating where the target light would appear) but some were invalid (i.e., providing inaccurate information about the light’s location). Responses to the light were fastest to valid cues, intermediate to neutral cues (a central cross) and slowest to invalid cues. The findings were comparable for central and peripheral cues. When the cues were valid on only a small fraction of trials, they were ignored when they were central cues. However, they influenced performance when they were peripheral cues. The above findings led Posner (1980) to distinguish between two attention systems: (1) An endogenous system: it is controlled by the individual’s intentions and is used when central cues are presented. (2) An exogenous system: it automatically shifts attention and is involved when uninformative peripheral cues are presented. Stimuli that are salient or different from other stimuli (e.g., in colour) are most likely to be attended to using this system.

Corbetta and Shulman’s (2002) approach Corbetta and Shulman (2002) identified two attention systems that are involved in basic aspects of visual processing. First, there is a goal-directed or top-down system resembling Posner’s endogenous system. This dorsal attention network consists of a fronto-parietal network including the intraparietal sulcus. It is influenced by expectations, knowledge and current goals. It is used when a cue predicts the location or other feature of a forthcoming visual stimulus. Second, Corbetta and Shulman (2002) identified a stimulus-driven or bottom-up attention system resembling Posner’s exogenous system. This is the ventral attention network and consists primarily of a right-­hemisphere ventral fronto-parietal network. This system is used when an unexpected and potentially important stimulus (e.g., flames appearing under the door) occurs. Thus, it has a “circuit-breaking” function, meaning visual attention is redirected from its current focus. What stimuli trigger this circuit-­ breaking? According to Corbetta et al. (2008), non-task stimuli (i.e., distractors) closely resembling task stimuli are especially likely to activate the ventral attention network although salient or conspicuous stimuli also activate the same network. Corbetta and Shulman (2011; see Figure 5.6) identified the brain areas associated with each network. Key areas within the dorsal attention network are as follows: superior parietal lobule (SPL), intraparietal sulcus

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 192

28/02/20 6:44 PM



Attention and performance

193 Figure 5.6 The brain areas associated with the dorsal or goaldirected attention network and the ventral or stimulusdriven network. The full names of the areas involved are indicated in the text. From Corbetta and Shulman (2011). © Annual Reviews. With permission of Annual Reviews.

(IPS), inferior frontal junction (IFJ), frontal eye field (FEF), middle temporal area (MT) and V3A (a visual area). Key areas within the ventral attention network are as follows: inferior frontal junction (IFJ), inferior frontal gyrus (IFG), supramarginal gyrus (SMG), superior temporal gyrus (STG) and insula (Ins). The temporo-parietal junction also forms part of the ventral attention network. The existence of two attention networks makes much sense. The goal-directed system (dorsal attention network) allows us to attend to stimuli directly relevant to our current goals. If we only had this system, however, our attentional processes would be dangerously inflexible. It is also important to have a stimulus-driven attentional system (ventral attention network) leading us to switch attention away from goal-relevant stimuli to unexpected threatening stimuli (e.g., a ferocious animal). More generally, the two attention networks typically interact effectively with each other.

Findings Corbetta and Shulman (2002) supported their two-network model by carrying out meta-analyses of brain-imaging studies. In essence, they argued, brain areas most often activated when participants expect a stimulus that has not yet been presented form the dorsal attention network. In contrast, brain areas most often activated when individuals detect low-frequency targets form the ventral attention network. Hahn et al. (2006) tested Corbetta and Shulman’s (2002) theory by comparing patterns of brain activation when top-down and bottom-up processes were required. As predicted, there was little overlap between the brain areas associated with top-down and bottom-up processing. In addition, the brain areas involved in each type of processing corresponded reasonably well to those identified by Corbetta and Shulman. Chica et al. (2013) reviewed research on the two attention systems and identified 15 differences between them. For example, stimulus-driven attention is faster than top-down attention and is more object-based. In addition, it is more resistant to interference from other peripheral cues once activated. The existence of so many differences strengthens the argument the two attentional systems are separate. Considerable research evidence (mostly involving neuroimaging) indicates the dorsal and ventral attention systems are associated with distinct

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 193

28/02/20 6:44 PM

194

Visual perception and attention

neural circuits even during the resting state (Vossel et al., 2014). However, neuroimaging studies cannot establish that any given brain area is necessarily involved in stimulus-driven or goal-directed attention processes. Chica et al. (2011) provided relevant evidence by using transcranial magnetic stimulation (TMS; see Glossary) to interfere with processing in a given brain area. TMS applied to the right temporo-parietal junction impaired the functioning of the stimulus-driven system but not the top-down one. In the same study, Chica et al. (2011) found TMS applied to the right intraparietal sulcus impaired the functioning of both attention systems. This provides evidence of the two attention systems working together. Evidence from brain-damaged patients (discussed below, see pp. 196–200) is also relevant to establishing the brain areas necessarily involved in goal-­ directed or stimulus-driven attentional processes. Shomstein et  al. (2010) had brain-damaged patients complete two tasks, one requiring stimulus-­ driven attentional processes whereas the other required top-down processes. Patients having greater problems with top-down attentional processing typically had brain damage to the superior parietal lobule (part of the dorsal attention network). In contrast, patients having greater problems with stimulus-driven attentional processing typically had brain damage to the temporo-­parietal junction (part of the ventral attention network). Wen et al. (2012) investigated interactions between the two visual attention systems. They assessed brain activation while participants responded to target stimuli in one visual field while ignoring all stimuli in the unattended visual field. There were two main findings. First, stronger causal influences of the top-down system on the stimulus-driven system led to  superior performance on the task. This finding suggests the appearance of an object at the attended location caused the top-down attention system to suppress activity within the stimulus-driven system. Second, stronger causal influences of the stimulus-driven system on  the top-down system were associated with impaired task performance. This finding suggests activation within the stimulus-driven system produced  by  stimuli not in attentional focus disrupted the attentional set maintained by the top-down system.

Recent developments Corbetta and Shulman’s (2002) theoretical approach has been developed in recent years. Here we briefly consider three such developments. First, we now have a greater understanding of interactions between their two attention networks. Meyer et al. (2018) found stimulus-driven and goal-directed attention both activated frontal and parietal regions within the dorsal attention network, suggesting it has a pivotal role in integrating bottom-up and top-down processing. Second, previous research reviewed by Corbetta and Shulman (2002) indicated the dorsal attention network is active immediately prior to the presentation of an anticipated visual stimulus. However, this research did not indicate how long this attention network remained active. Meehan et  al. (2017) addressed this issue and discovered that top-down influences associated within the dorsal attention network persisted over a relatively long time period.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 194

28/02/20 6:44 PM



Attention and performance

195

Third, brain networks relevant to attention additional to those within KEY TERM Corbetta and Shulman’s (2002) theory have been identified (Sylvester Default mode network et al., 2012). One such network is the cingulo-opercular network including A network of brain the anterior insula/operculum and dorsal anterior cingulate cortex (dACC; regions that is active “by default” when an see Figure 5.7). This network is associated with non-selective attention or individual is not involved alertness (Coste & Kleinschmidt, 2016). in a current task; it is Another additional network is the default mode network including associated with internal the posterior cingulate cortex (PCC), the lateral parietal cortex (LP), the processes including mindinferior temporal cortex (IT), the medial prefrontal cortex (MPF) and the wandering, remembering the past and imagining subgenual anterior cingulate cortex (sgACC). The default mode network is activated during internally focused cognitive processes (e.g., mind-­ the future. wandering; imagining the future). What is the relevance of this network to attention? In essence, performance on tasks requiring externally focused attention is often enhanced if the default mode network is deactivated (Amer et al., 2016a). Finally, there is the fronto-parietal network (Dosenbach et al., 2008), which includes the anterior dorsolateral prefrontal cortex (aDLPFC), the

(a)

IPS

aDLPFC

TPJ LP

VLPFC anterior insula

IT

(b)

MCC dACC PCC MPF

sgACC

Networks

Key:

Fronto-parietal

Default mode

Cingulo-opercular

Ventral attention

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 195

Figure 5.7 This is part of a theoretical approach based on several functional networks of relevance to attention: the four networks shown (fronto-parietal; default mode; cingulo-opercular; and ventral attention) are all discussed fully in the text. Sylvester et al., 2012, p. 528. Reprinted with permission of Elsevier.

28/02/20 6:44 PM

196

KEY TERMS Neglect A disorder involving right-hemisphere damage (typically) in which the left side of objects and/ or objects presented to the left visual field are undetected; the condition resembles extinction but is more severe. Pseudo-neglect A slight tendency in healthy individuals to favour the left side of visual space.

Visual perception and attention

middle cingulate cortex (MCC) and the intraparietal sulcus (IPS). It is associated with top-down attentional and cognitive control.

Evaluation The theoretical approach proposed by Corbetta and Shulman (2002) has several successes to its credit. First, there is convincing evidence for somewhat separate stimulus-driven and top-down attention systems, each with its own brain network. Second, research using transcranial magnetic stimulation suggests major brain areas within each attention system play a causal role in attentional processes. Third, some interactions between the two networks have been identified. Fourth, research on brain-damaged patients supports the theoretical approach (see next section, pp. 196–200). What are the limitations of this theoretical approach? First, the precise brain areas associated with each attentional system have not been clearly identified. Second, there is more commonality (especially within the parietal lobe) in the brain areas associated with the two attention networks than assumed theoretically by Corbetta and Shulman (2002). Third, there are additional attention-related brain networks not included within the original theory. Fourth, much remains to be discovered about how different attention systems interact.

DISORDERS OF VISUAL ATTENTION Here we consider two important attentional disorders in brain-damaged individuals: neglect and extinction. Neglect (or spatial neglect) involves a lack of awareness of stimuli presented to the side of space on the opposite side to the brain damage (the contralesional side). This occurs because information from the left side of the visual field proceeds to the right hemisphere. Most neglect patients have damage in the right hemisphere and so lack awareness of stimuli on the left side of the visual field: space-based or egocentric neglect. For example, patients crossing out targets presented to their left or right side (cancellation task) cross out more of those presented to the right. When instructed to mark the centre of a horizontal line (line bisection task), patients put it to the right of the centre. Note that the right hemisphere is dominant in spatial attention in healthy individuals – they exhibit pseudo-neglect, in which the left side of visual space is favoured (Friedrich et al., 2018). There is also object-centred or allocentric neglect involving a lack of awareness of the left side of objects (see Figure 5.8). Patients with right-hemisphere damage typically draw the right side of all figures in a multi-object scene but neglect their left side in the left and right visual fields (Gainotti & Ciaraffa, 2013). Do allocentric and egocentric neglect reflect a single disorder or separate disorders? Rorden et al. (2012) obtained two findings supporting the single disorder explanation. First, the correlation between the extent of each form of neglect across 33 patients was +.80. Second, similar brain regions were associated with each type of neglect. However, Pedrazzini et al. (2017) found damage to the intraparietal sulcus was more associated



197

Attention and performance

Figure 5.8 On the left is a copying task in which a patient with unilateral neglect distorted or ignored the left side of the figures to be copied (shown on the left). On the right is a clock drawing task in which the patient was given a clock face and told to insert the numbers into it. Reprinted from Danckert and Ferber (2006). Reprinted with permission from Elsevier.

with allocentric than egocentric neglect, whereas the opposite was the case with damage to the temporo-parietal junction. Extinction is often found in neglect patients. Extinction involves a failure to detect a stimulus presented to the side opposite the brain damage when a second stimulus is presented to the same side as the brain damage. Extinction and neglect are closely related but separate deficits (de Haan et al., 2012). We will focus mostly on neglect because it has attracted much more research. Which brain areas are damaged in neglect patients? Neglect is a heterogeneous condition and the brain areas damaged vary considerably across patients. In a meta-analysis, Molenberghs et al. (2012) found the main areas damaged in neglect patients are in the right hemisphere and include the superior temporal gyrus, the inferior frontal gyrus, the insula, the supramarginal gyrus and the angular gyrus (gyrus means ridge). Nearly all these areas are within the stimulus-driven or ventral attention network (see Figure 5.6) suggesting brain networks are damaged rather than simply specific brain areas (Corbetta & Shulman, 2011). We also need to consider functional connectivity (correlated brain activity between brain regions). Baldassarre et al. (2014, 2016) discovered widespread disruption of functional connectivity between the hemispheres in neglect patients. This disruption did not involve the bottom-up and topdown attention networks. Of importance, recovery from attention deficits in neglect patients was associated with improvements in functional connectivity in bottom-up and top-down attention networks (Ramsey et al., 2016). The right-hemisphere temporo-parietal junction and intraparietal sulcus are typically damaged in extinction patients (de Haan et al., 2012). When transcranial magnetic stimulation is applied to these areas to interfere with processing, extinction-like behaviour results (de Haan et al., 2012). Dugué et al. (2018) confirmed the importance of the temporo-parietal junction

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 197

Interactive feature: Primal Pictures’ 3D atlas of the brain

KEY TERM Extinction A disorder of visual attention in which a stimulus presented to the side opposite the brain damage is not detected when another stimulus is presented at the same time to the side of the brain damage.

28/02/20 6:44 PM

198

Visual perception and attention

(part of the ventral attention network) in control of spatial attention in a neuroimaging study on healthy individuals. However, its subregions varied in terms of their involvement in voluntary and involuntary attention shifts.

Conscious awareness and processing Neglect patients generally report no conscious awareness of stimuli presented to the left visual field. However, that does not necessarily mean those stimuli are not processed. Vuilleumier et al. (2002b) presented extinction patients with two pictures at the same time, one to each visual field. The patients showed very little memory for left-field stimuli. Then the patients identified degraded pictures. There was a facilitation effect for left-field pictures indicating they had been processed. Vuilleumier et al. (2002a) presented GK, a male patient with neglect and extinction, with fearful faces. He showed increased activation in the amygdala (associated with emotional responses) whether or not these faces were consciously perceived. This is explicable given there is a processing route from the retina to the amygdala bypassing the cortex (Diano et al., 2017). Sarri et al. (2010) found extinction patients had no awareness of leftfield stimuli. However, these stimuli were associated with activation in early visual processing areas, indicating they received some processing. Processing in neglect and extinction has been investigated using event-related potentials. Di Russo et al. (2008) focused on the processing of left-field stimuli not consciously perceived by neglect patients. Early processing of these stimuli was comparable to that of healthy controls with only later processing being disrupted. Lasaponara et al. (2018) obtained similar findings in neglect patients. In healthy individuals, the presentation of left-field targets inhibits processing of right-field space. This was less the case in neglect patients, which helps to explain their lack of conscious perception of left-field stimuli.

Theoretical considerations Corbetta and Shulman (2011) discussed neglect in the context of their two-system theory (discussed earlier, see pp. 192–196). In essence, the bottom-up ventral attention network is damaged. Strong support for this assumption was reported by Toba et al. (2018a) who found in 25 neglect patients that impaired performance on tests of neglect was associated with damage to parts of the ventral attention network (e.g., angular gyrus; supramarginal gyrus). Since the right hemisphere is dominant in the ventral attention network, neglect patients typically have damage in that hemisphere. Of importance, Corbetta and Shulman (2011) also assumed that damage to the ventral network impairs the functioning of the goal-directed dorsal attention network (even though not itself damaged). How does the damaged ventral attention network impair the dorsal attention network’s functioning? The two attention networks interact and so damage to the ventral network inevitably affects the functioning of the dorsal network. More specifically, damage to the ventral attention network “impairs non-spatial [across the entire visual field] functions, hypoactivates

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 198

28/02/20 6:44 PM



Attention and performance

199

[reduces activation in] the right hemisphere, and unbalances the activity of the dorsal attention network” (Corbetta & Shulman, 2011, p. 592). de Haan et al. (2012) proposed a theory of extinction based on two major assumptions: (1) “Extinction is a consequence of biased competition for attention between the ipsilesional [right-field] and contralesional [left-field] target stimuli” (p. 1048); (2) Extinction patients have much reduced attentional capacity so often only one target [the right-field one] can be detected.

Findings According to Corbetta and Shulman (2011), the dorsal attention network in neglect patients functions poorly because of reduced activation in the right hemisphere and associated reduced alertness and attentional resources. Thus, increasing patients’ general alertness should enhance their detection of left-field visual targets. Robertson et al. (1998) found the slower detection of left visual field stimuli compared to those in the right visual field was no longer present when warning sounds were used to increase alertness. Bonato and Cutini (2016) compared neglect patients’ ability to detect visual targets with (or without) a second, attentionally demanding task. Detection rates were high for targets presented to the right visual field in both conditions. In contrast, patients detected only approximately 50% as many targets in the left visual field as the right when performing another task. Thus, neglect patients have limited attentional resources. Corbetta and Shulman (2011) assumed neglect patients have an essentially intact dorsal attention network. Accordingly, neglect patients might use that network effectively if steps were taken to facilitate its use. Duncan et al. (1999) presented arrays of letters and neglect patients recalled only those in a pre-specified colour (the dorsal attention network could be used to select the appropriate letters). Neglect patients resembled healthy controls in showing equal recall of letters presented to each side of visual space. The two attention networks typically work closely together. Bays et al. (2010) studied neglect patients. They used eye movements during a visual search to assess patients’ problems with top-down and stimulus-driven attentional processes. Both types of attentional processes were equally impaired (as predicted by Corbetta and Shulman, 2011). Of most importance, there was a remarkably high correlation of +.98 between these two types of attentional deficit. Toba et al. (2018b) identified two reasons for the failure of neglect patients to detect left-field stimuli: (1) a “magnetic” attraction of attention (i.e., right-field stimuli immediately capture attention). (2) impaired spatial working memory making it hard for patients to keep track of the locations of stimuli. Both reasons were equally applicable to most patients. However, the first reason was dominant in 12% of patients and the second reason in

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 199

28/02/20 6:44 PM

200

KEY TERM Visual search A task involving the rapid detection of a specified target stimulus within a visual display.

Visual perception and attention

24% of patients. Accordingly, Toba et al. argued we should develop ­multi-component models of visual neglect to account for such individual differences. We turn now to extinction patients. According to de Haan et al. (2012), extinction occurs because of biased competition between stimuli. If two stimuli could be integrated, that might minimise competition and so reduce extinction. Riddoch et al. (2006) tested this prediction by presenting objects used together often (e.g., wine bottle and wine glass) or never used together (e.g., wine bottle and ball). Extinction patients identified both objects more often in the former condition than the latter (65% vs 40%, respectively). The biased competition hypothesis has been tested in other ways. We can impair attentional processes in the intact left hemisphere by applying transcranial magnetic stimulation to it. This should reduce competition from the left hemisphere in extinction patients and thus reduce extinction. Some findings are consistent with this prediction (Oliveri & Caltagirone, 2006). de Haan et al. (2012) also identified reduced attentional capacity as a factor causing extinction. Bonato et al. (2010) studied extinction with or without the addition of a second, attentionally demanding task. As predicted, extinction patients showed a substantial increase in the extinction rate (from 18% to over 80%) with this additional task.

Overall evaluation Research has produced several important findings. First, neglect and extinction patients can process unattended visual stimuli in the absence of conscious awareness of those stimuli. Second, most neglect patients have damage to the ventral attention network leading to impaired functioning of the undamaged dorsal attention network. Third, extinction occurs because of biased competition for attention and reduced attentional capacity. What are the limitations of research in this area? First, it is hard to produce theoretical accounts applicable to all neglect or extinction patients because the precise symptoms and regions of brain damage vary considerably across patients. Second, neglect patients vary in their precise processing deficits (e.g., Toba et al., 2018b), but this has been ­de-emphasised in most theories. Third, the precise relationship between neglect and e­xtinction remains unclear. Fourth, the dorsal and ventral networks generally interact but the extent of their interactions remains to be determined.

VISUAL SEARCH We spend much time searching for various objects (e.g., a friend in a crowd). The processes involved have been studied in research on visual search where a specified target is detected as rapidly as possible. Initially, we consider an important real-world situation where visual search can be literally a matter of life-or-death: airport security checks. After that, we consider an early very influential theory of visual search before discussing more recent theoretical and empirical developments.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 200

28/02/20 6:44 PM



Attention and performance

201

IN THE REAL WORLD: AIRPORT SECURITY CHECKS Airport security checks have become more thorough since 9/11. When your luggage is x-rayed, an airport security screener searches for illegal and dangerous items (see Figure 5.9). Screeners are well trained but mistakes sometimes occur.

Figure 5.9 Each bag contains one illegal item. From left to right: a large bottle; a dynamite stick; and a gun part. From Mitroff & Biggs (2014).

There are two major reasons it is often hard for airport security screeners to detect dangerous items. First, illegal and dangerous items are (thankfully!) present in only a minute fraction of passengers’ luggage. This rarity of targets makes it hard for airport security screeners to detect them. Mitroff and Biggs (2014) asked observers to detect illegal items in bags (see Figure 5.9). The detection rate was only 27% when targets appeared under 0.15% of the time: they termed this the “ultra rare item effect”. In contrast, the detection rate was 92% when targets appeared more than 1% of the time. Peltier and Becker (2016) tested two explanations for the reduced detection rate with rare targets: (1) a reduced probability that the target is fixated (selection error); and (2) increased caution about reporting targets because they are so unexpected (identification error). There was evidence for both explanations. However, most detection failures were selection errors (see Figure 5.10).

Accuracy

Accuracy 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

Target present

10

50

Target absent

90

Prevalence

Figure 5.10 Frequency of selection and identification errors when targets were present on 10%, 50% or 90% of trials. From Peltier and Becker (2016).

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 201

28/02/20 6:44 PM

202

Visual perception and attention

Second, security screeners search for numerous different objects. This increases search difficulty. Menneer et al. (2009) found target detection was worse when screeners searched for two categories of objects (metal threats and improvised explosive devices) rather than one. How can we increase the efficiency of security screening? First, we can exploit individual differences in the ability to detect targets. Rusconi et al. (2015) found individuals scoring high on a questionnaire measure of attention to detail had superior target-detection performance than low scorers. Second, airport security screeners can find it hard to distinguish between targets (i.e., dangerous items) and similar-looking non-targets. Geng et al. (2017) found that observers whose training included non-targets resembling targets learned to develop increasingly precise internal target representations. Such representations can improve the speed and accuracy of security screening. Third, the low detection rate when targets are very rare can be addressed. Threat image projection (TIP) can be used to project fictional threat items into x-ray images of luggage to increase the apparent frequency of targets. When screeners are presented with TIPs plus feedback when they miss them, screening performance improves considerably (Hofer & Schwaninger, 2005). In similar fashion, Schwark et al. (2012) found providing false feedback to screeners to indicate they had missed rare targets reduced their cautiousness about reporting targets and improved their performance.

Feature integration theory Feature integration theory was proposed by Treisman and Gelade (1980) and subsequently updated and modified (e.g., Treisman, 1998). According to the theory, we need to distinguish between object features (e.g., colour; size; line orientation) and the objects themselves. There are two processing stages: (1) Basic visual features are processed rapidly and pre-attentively in parallel across the visual scene. (2) Stage (1) is followed by a slower serial process with focused attention providing the “glue” to form objects from the available features (e.g., an object that is round and has an orange colour is perceived as an orange). In the absence of focused attention, features from different objects may be combined randomly producing an illusory conjunction.

KEY TERM Illusory conjunction Mistakenly combining features from two different stimuli to perceive an object that is not present.

It follows from the above assumptions that targets defined by a single feature (e.g., a blue letter or an S) should be detected rapidly and in parallel. In contrast, targets defined by a conjunction or combination of features (e.g., a green letter T) should require focused attention and so should be slower to detect. Treisman and Gelade (1980) tested these predictions using both types of targets; the display size was 1–30 items and a target was present or absent. As predicted, response was rapid and there was very little effect of display size when the target was defined by a single feature: these findings suggest parallel processing (see Figure 5.11). Response was slower and was strongly influenced by display size when the target was defined by a conjunction of features: these findings suggest there was serial processing. According to the theory, lack of focused attention can produce illusory conjunctions based on random combinations of features. Friedman-Hill

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 202

28/02/20 6:44 PM



Attention and performance

203 Figure 5.11 Performance speed on a detection task as a function of target definition (conjunctive vs single feature) and display size. Adapted from Treisman and Gelade (1980).

et  al. (1995) studied a brain-damaged patient (RM) having problems with the accurate location of visual stimuli. This patient produced many illusory conjunctions combining the shape of one stimulus with the colour of another.

Limitations What are the theory’s limitations? First, Duncan and Humphreys (1989, 1992) identified two factors not included within feature integration theory: (1) When distractors are very similar to each other, visual search is faster because it is easier to identify them as distractors. (2) The number of distractors has a strong effect on search time to detect even targets defined by a single feature when targets resemble distractors. Second, Treisman and Gelade (1980) estimated the search time with conjunctive targets was approximately 60 ms per item and argued this represented the time taken for focal attention to process each item. However, research with other paradigms indicates it takes approximately 250 ms for attention indexed by eye movements to move from one location to another. Thus, it is improbable focal attention plays the key role assumed within the theory. Third, the theory assumes visual search is often item-by-item. However, the information contained within most visual scenes cannot be divided up into “items” and so the theory is of limited applicability. Such considerations led Hulleman and Olivers (2017) to produce an article entitled “The impending demise of the item in visual search”. Fourth, visual search involves parallel processing much more than implied by the theory. For example, Thornton and Gilden (2007) used 29 different visual tasks and found 72% apparently involved parallel

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 203

28/02/20 6:44 PM

204

Visual perception and attention

processing. We can explain such findings by assuming that each eye fixation permits considerable parallel processing using information available in peripheral vision (discussed below, see pp. 206–208). Fifth, the theory assumes that the early stages of visual search are entirely feature-based. However, recent research using event-related potentials indicates that object-based processing can occur much faster than predicted by feature integration theory (e.g., Berggren & Eimer, 2018). Sixth, the theory assumes visual search is essentially random. This assumption is wrong with respect to the real world – we typically use our knowledge of where a target object is likely to be located when searching for it (see below).

Dual-path model In most of the research discussed so far, the target appeared at a random location within the visual display. This is radically different from the real world. Suppose you are outside looking for your missing cat. Your visual search would be very selective – you would ignore the sky and focus mostly on the ground (and perhaps the trees). Thus, your search would involve top-down processes based on your knowledge of where cats are most likely to be found. Ehinger et al. (2009) studied top-down processes in visual search by recording eye fixations of observers searching for a person in 900 realworld outdoor scenes. Observers typically fixated plausible locations (e.g., pavements) and ignored implausible ones (e.g., sky; trees; see Figure 5.12). Observers also fixated locations differing considerably from neighbouring locations and areas containing visual features resembling those of a human figure. How can we reconcile Ehinger et al.’s (2009) findings with those discussed earlier? Wolfe et al. (2011) proposed a dual-path model (see Figure  5.13). There is a selective pathway of limited capacity (indicated by the bottleneck) with objects being selected individually for recognition.

Figure 5.12 The first three eye fixations made by observers searching for pedestrians. As can be seen, the great majority of their fixations were on regions in which pedestrians would most likely be found. Observers’ fixations were much more like each other in the lefthand photo than in the right-hand one, because there were fewer likely regions in the left-hand one. From Ehinger et al. (2009). Reprinted with permission from Taylor & Francis.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 204

28/02/20 6:44 PM



Attention and performance

ve

p

e

ns

No

Early vision

ti lec

y wa h t a

Features

thway ive pa Select

Color Orientation Size Depth Motion Etc.

205 Figure 5.13 A two-pathway model of visual search. The selective pathway is capacity limited and can bind stimulus features and recognise objects. The non-selective pathway processes the gist of scenes. Selective and non-selective processing occur in parallel to produce effective visual search. From Wolfe et al. (2011). Reprinted with permission from Elsevier.

Binding and recognition

This pathway has been the focus of most research until recently. There is also a non-selective pathway in which the “gist” of a scene is processed. Such processing can then guide processing within the selective pathway (represented by the arrow labelled “guidance”). This pathway allows us to utilise our stored environmental knowledge and so is of great value in the real world.

Findings Wolfe et al. (2011) compared visual searches for objects presented within a scene setting or at random locations. As predicted, search rate per item was much faster in the scene setting (10 ms vs 40 ms, respectively). Võ and Wolfe (2012) explained that finding in terms of “functional set size” – searching in scenes is efficient because most regions can be ignored. As predicted, Võ and Wolfe found 80% of each scene was rarely fixated. Kaiser and Cichy (2018) presented observers with objects typically located in the upper (e.g., aeroplane; hat) or lower (e.g., carpet; shoe) visual field. These objects were presented in their typical or atypical location (e.g., hat in the lower visual field). Observers had to indicate whether an object presented very briefly was located in the upper or lower visual field. Observers’ performance was better when objects appeared in their typical location because of their extensive knowledge of where objects are generally located. Chukoskie et al. (2013) found observers can easily learn where targets are located. An invisible target was presented at random locations on a blank screen and observers were provided with feedback. There was a strong learning effect – fixations rapidly shifted from being fairly

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 205

28/02/20 6:44 PM

206

KEY TERM Fovea A small area within the retina in the centre in the field of vision where visual acuity is greatest.

Visual perception and attention

random  to being focused on the area within which the target might be present. Ehinger et al.’s (2009) findings (discussed earlier, see p. 204) suggested that scene gist or context can be used to enhance the efficiency of visual search. Katti et al. (2017) presented scenes very briefly (83 ms) followed by a mask. Observers were given the task of detecting a person or a car and performed very accurately (over 90%) and rapidly. Katti et al. confirmed that scene gist or context influenced performance. However, performance was influenced more strongly by features of the target object  – the more key features of an object were visible, the faster it was detected. What is the take-home message from the above study? The efficiency of visual search with real-world scenes is more complex than implied by Ehinger et al. (2009). More specifically, observers may rapidly fixate on the area close to a target person because they are using scene gist or because they rapidly process features of the person (e.g., wearing clothes).

Evaluation Our knowledge of likely (and unlikely) locations for any given object in a scene influences visual search in the real world. This is fully acknowledged in the dual-path model. There is also support for the notion that scene knowledge facilitates visual search by reducing functional set size. What are the model’s limitations? First, how we use gist knowledge of a scene very rapidly to reduce the search area remains unclear. Second, there is insufficient focus on the learning processes that can greatly facilitate visual search – the effects of such processes can be seen in the very rapid and accurate detection of target information by experts in several domains (see Chapter 11). Third, it is important not to exaggerate the importance of scene gist or context in influencing the efficiency of visual search. Features of the target object can influence visual search more than scene gist (Katti et al., 2017). Fourth, the assumption that items are processed individually within the selective pathway is typically mistaken. As we will see shortly, visual search often depends on parallel processes within peripheral vision and such processes are not considered within the model.

Attention vs perception: texture tiling model Several theories (e.g., Treisman & Gelade, 1980) have assumed that individual items are the crucial units in visual search. Such theories have also often assumed that slow visual search depends mostly on the limitations of focused attention. A plausible implication of these assumptions is that slow visual search depends mostly on foveal vision (the fovea is a small area of maximal visual acuity in the retina). Both the above assumptions have been challenged recently. At the risk of oversimplification, full understanding of visual search requires less emphasis on attention and more on perception. According to Rosenholtz (2016), peripheral (non-foveal) vision is of crucial importance. Acuity decreases as we move away from the fovea to the periphery of vision, but much less than often assumed. You can demonstrate this by holding out

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 206

28/02/20 6:44 PM



207

Attention and performance

your thumb and fixating the nail. Foveal vision only covers the nail so the great majority of what you can see is in peripheral vision. We can also compare the value of foveal and peripheral vision by considering individuals with impaired eyesight. Those with severely impaired peripheral vision (e.g., due to glaucoma) had greater problems with mobility (e.g., number of falls; ability to drive) than those who lack foveal vision (due to macular degeneration) (Rosenholtz, 2016). Individuals with severely impaired central or foveal vision performed almost as well as healthy controls at detecting target objects in coloured scenes (75% vs 79%, respectively) (Thibaut et al., 2018). If visual search depends heavily on peripheral vision, what predictions can we make? First, if each fixation provides observers with a considerable amount of information about several objects, visual search will typically involve parallel rather than serial processing. Second, we need to consider limitations of peripheral vision (e.g., visual acuity is less in peripheral than foveal vision). However, a more important limitation concerns visual crowding – a reduced ability to recognise objects or other stimuli because of irrelevant neighbouring objects or stimuli (clutter). Visual crowding impairs peripheral vision to a much greater extent than foveal vision. Rosenholtz et al. (2012) proposed the texture tiling model based on the assumption peripheral vision is of crucial importance in visual search. More specifically, processing in peripheral vision can cause adjacent stimuli to tile (join together) to form an apparent target, thus increasing the difficulty of visual search. Below we consider findings relevant to this model.

KEY TERM Visual crowding The inability to recognise objects in peripheral vision due to the presence of neighbouring objects.

Findings As mentioned earlier (p. 203), Thornton and Gilden (2007) found almost three-quarters of the visual tasks they studied involved parallel processing. This is entirely consistent with the emphasis on parallel processing in the model. Direct evidence for the importance of peripheral vision to visual search was reported by Young and Hulleman (2013). They manipulated the visible area around the fixation point making it small, medium or large. As predicted by the model, visual search performance was worst when the visible area was small (so only one item could be processed per fixation). Overall, visual search was almost parallel when the visible area was large but serial when it was small. Chang and Rosenholtz (2016) used various search tasks. According to feature integration theory, both tasks shown in Figure 5.14 should be comparably hard because the target and distractors share features. In contrast, the texture tiling model predicts the task on the right should be harder because adjacent distractors seen in peripheral vision can more easily tile (join together) to form an apparent T. The findings from these tasks (and several others) supported the texture tiling model but were inconsistent with feature integration theory. Finally, Hulleman and Olivers (2017) produced a model of visual search consistent with the texture tiling model. According to this model, each eye fixation lasts 250 ms, during which information from foveal and peripheral vision is extracted in parallel. They also assumed that the area

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 207

28/02/20 6:44 PM

208

Visual perception and attention

Figure 5.14 The target (T) is easier to find in the display on the left than the one on the right. From Chang and Rosenholtz (2016).

(a)

Easier search

(b)

Harder search

Find the T

around the fixation point within which a target can generally be detected is smaller when the visual search task is difficult (e.g., because target discriminability is low). A key prediction from Hulleman and Olivers’ (2017) model is that the main reason why search times are longer with more difficult search tasks is because more eye fixations are required than with easier tasks. A computer simulation based on these assumptions produced search times very similar to those obtained in experimental studies.

Evaluation What are the strengths of the texture tiling model? First, the information available in peripheral vision is much more important in visual search than assumed previously. The model explains how observers make use of the information available in peripheral vision. Second, the model explains why parallel processing is so prevalent in visual search – it reflects directly parallel processing within peripheral vision. Third, there is accumulating evidence that search times are generally directly related to the number of eye fixations. Fourth, an approach based on eye fixations and peripheral vision can potentially explain findings from all visual search paradigms, including complex visual scenes and item displays. Such an approach thus has more general applicability than feature integration theory. What are the model’s limitations? First, as Chang and Rosenholtz (2016) admitted, it needs further development to account fully for visual search performance. For example, it does not predict search times with precision. In addition, it does not specify the criteria used by observers to decide no target is present. Second, visual search is typically much faster for experts than non-­ experts in their domain of expertise (e.g., medical experts examining mammograms) (see Chapter 11). The texture tiling model does not identify clearly the processes allowing experts to make very efficient use of peripheral information.

CROSS-MODAL EFFECTS Nearly all the research discussed so far is limited in that the visual (or auditory) modality was studied on its own. We might try to justify this approach by assuming attentional processes in each sensory modality operate

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 208

28/02/20 6:44 PM



209

Attention and performance

independently from those in other modalities. However, that assumption is incorrect. In the real world, we often coordinate information from two or more sense modalities at the same time (cross-modal attention). An example is lip reading, where we use visual information about a speaker’s lip movements to facilitate our understanding of what they are saying (see Chapter 9). Suppose we present participants with two streams of lights (as was done by Eimer and Schröger, 1998), with one stream being presented to the left and the other to the right. At the same time, we present participants with two streams of sounds (one to each side). In one condition, participants detect deviant visual events (e.g., longer than usual stimuli) presented to one side only. In the other condition, participants detect deviant auditory events in only one stream. Event-related potentials were recorded to assess the allocation of attention. Unsurprisingly, Eimer and Schröger (1998) found ERPs to deviant stimuli in the relevant modality were greater to stimuli presented on the to-be-attended side than the to-be-ignored side. Thus, participants allocated attention as instructed. Of more interest is what happened to the allocation of attention in the irrelevant modality. Suppose participants detected visual targets on the left side. In that case, ERPs to deviant auditory stimuli were greater on the left side than the right. This is a cross-modal effect: the voluntary or endogenous allocation of visual attention also affected the allocation of auditory attention. Similarly, when participants detected auditory targets on one side, ERPs to deviant visual stimuli on the same side were greater than ERPs to those on the opposite side. Thus, the allocation of auditory attention also influenced the allocation of visual attention.

KEY TERMS Cross-modal attention The coordination of attention across two or more modalities (e.g., vision and audition). Ventriloquism effect The mistaken perception that sounds are coming from their apparent source (as in ventriloquism).

Ventriloquism effect What happens when there is a conflict between simultaneous visual and auditory stimuli? We will focus on the ventriloquism effect in which sounds are misperceived as coming from their apparent visual source. Ventriloquists (at least good ones!) speak without moving their lips while manipulating a dummy’s mouth movements. It seems as if the dummy is speaking. Something similar happens at the movies. The actors’ lips move on the screen but their voices come from loudspeakers beside the screen. Nevertheless, we hear those voices coming from their mouths. Certain conditions must be satisfied for the ventriloquism effect to occur (Recanzone & Sutter, 2008). First, the visual and auditory stimuli must occur close together in time. Second, the sound must match expectations created by the visual stimulus (e.g., high-pitched sound coming from a small object). Third, the sources of the visual and auditory stimuli should be close together spatially. More generally, the ventriloquism effect reflects the unity assumption (the assumption that two or more sensory cues come from the same object: Chen & Spence, 2017). The ventriloquism effect exemplifies visual dominance (visual information dominating perception). Further evidence comes from the Colavita effect (Colavita, 1974): participants instructed to respond to all stimuli respond more often to visual than simultaneous auditory stimuli (Spence et al., 2011).

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 209

28/02/20 6:44 PM

210

KEY TERM Temporal ventriloquism effect Misperception of the timing of a visual stimulus when an auditory stimulus is presented close to it in time.

Visual perception and attention

When during processing is visual spatial information integrated with auditory information? Shrem et al. (2017) found that misleading visual information about the location of an auditory stimulus influenced the processing of the auditory stimulus approximately 200 ms after stimulus onset. The finding that this effect is still present even when participants are aware of the spatial discrepancy between the visual and auditory input suggests it occurs relatively “automatically”. However, the ventriloquism effect is smaller when participants had previously heard syllables spoken in a fearful voice (Maiworm et al., 2012). This suggests the effect is not entirely “automatic” but is reduced when the relevance of the auditory channel is increased. Why does vision capture sound in the ventriloquism effect? The visual modality typically provides more precise information about spatial location. However, when visual stimuli are severely blurred and poorly localised, sound captures vision (Alais & Burr, 2004). Thus, we combine visual and auditory information effectively by attaching more weight to the more informative sense modality.

Temporal ventriloquism The above explanation for the ventriloquist illusion is a development of the modality appropriateness and precision hypothesis (Welch & Warren, 1980). According to this hypothesis, when conflicting information is presented in two or more modalities, the modality having the greatest acuity generally dominates. This hypothesis predicts the existence of another illusion. The auditory modality is typically more precise than the visual modality at discriminating temporal relations. As a result, judgements about the temporal onset of visual stimuli might be biased by auditory stimuli presented very shortly beforehand or afterwards. This is the temporal ­ventriloquism effect. Research on temporal ventriloquism was reviewed by Chen and Spence (2017). A simple example is when the apparent onset of a flash is shifted towards an abrupt sound presented slightly asynchronously (see Figure  5.15). Other research has found that the apparent duration of visual stimuli can be distorted by asynchronous auditory stimuli. We need to consider the temporal ventriloquism effect in the context of the unity assumption. This is the assumption that “two or more uni-sensory cues belong together (i.e., that they come from the same object or event)” (Chen & Spence, 2017, p. 1). Chen and Spence discussed findings showing that Figure 5.15 the unity assumption generally (but not An example of temporal ventriloquism in which the apparent always) enhances the temporal ventriloquism time of onset of a flash is shifted towards that of a sound effect. presented at a slightly different timing from the flash. Orchard-Mills et al. (2016) extended From Chen and Vroomen (2013). Reprinted with permission from Springer. research by using two visual stimuli (one

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 210

28/02/20 6:44 PM



Attention and performance

211

IN THE REAL WORLD: WARNING SIGNALS PROMOTE SAFE DRIVING Front-to-rear-end collisions cause 25% of road accidents with driver inattention the most common cause (Spence, 2012). Thus, it is important to devise effective warning signals to enhance driver attention and reduce collisions. Warning signals might be especially useful if they were informative (i.e., indicating the nature of the danger). However, informative warning signals requiring time-consuming cognitive processing might be counterproductive. Ho and Spence (2005) considered drivers’ reaction times when braking to avoid a car in front or accelerating to avoid a speeding car behind. An auditory warning signal (car horn) came from the same direction as the critical visual event on 80% or 50% of trials. Braking times were faster when the sound and critical visual event were from the same direction. The greater beneficial effects of auditory signals when predictive rather than non-predictive suggests the involvement of endogenous spatial attention (controlled by the individual’s intentions). Auditory stimuli also influenced visual attention even when non-predictive: this probably involved exogenous spatial attention (“automatic” allocation of attention). Gray (2011) studied braking times to avoid a collision with the car in front when drivers heard auditory warning signals increasing in intensity as the time to collision reduced. These signals are known as looming sounds. The most effective condition was the one where the rate of increase in the intensity of the auditory signal was the fastest because it implied the time to collision was the least. Lahmer et al. (2018) found evidence that looming sounds are effective because they are consistent with the visual experience of an approaching collision. Vibrotactile signals produce the perception of vibration through touch. Gray et al. (2014) studied the effects of such signals on speed of braking to avoid a collision. Signals were presented at three sites on the abdomen arranged vertically. In the most effective condition, successive signals moved towards the driver’s head at an increasing rate reflecting the speed they were approaching the car in front. Braking time was 250 ms faster in this condition than a no-warning control condition, probably because it was highly informative. Ahtamad et al. (2016) compared the effectiveness of three vibrotactile warning signals delivered to the back on braking times to avoid a collision with the car in front: (1) expanding (centre of back followed by areas to left and right); (2) contracting (areas to left and right followed by the centre of the back); (3) static (centre of the back + areas to left and right at the same time). The dynamic vibrotactile conditions (1 and 2) produced comparable braking reaction times that were faster than those in the static condition (3). In a second experiment, Ahtamad et al. (2016) compared the expanding vibrotactile condition against a linear motion condition (vibrotactile stimulation to the hands followed by the shoulders). Emergency braking reaction times were faster in the linear motion condition (approximately 585 ms vs 640 ms) because drivers found it easier to interpret the warning signals in that condition. In sum, the various auditory and vibrotactile warning signals discussed above typically reduce braking reaction times by approximately 40 ms. That sounds modest. However, it can easily be the difference between colliding with the car in front or avoiding it and so could potentially save many lives. At present, however, we lack a theoretical framework within which to understand precisely why some warning signals are more effective than others.

above and the other below fixation) and two auditory stimuli (low- and high-pitch). When the visual and auditory stimuli were congruent (e.g., visual stimulus above fixation and auditory stimulus high-pitch), the temporal ventriloquism effect was found. However, this effect was eliminated when the visual and auditory stimuli were incongruent, which prevented binding of information across the two senses.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 211

28/02/20 6:44 PM

212

KEY TERMS Endogenous spatial attention Attention to a stimulus controlled by intentions or goal-directed mechanisms. Exogenous spatial attention Attention to a given spatial location determined by “automatic” processes. Multi-tasking Performing two or more tasks at the same time by switching rapidly between them.

Case study: Multi-tasking efficiency

Visual perception and attention

Overall evaluation What are the limitations of research on cross-modal effects? First, as just mentioned, our theoretical understanding lags behind the accumulation of empirical findings. Second, much research has involved complex artificial tasks far removed from naturalistic conditions. Third, individual differences have generally been ignored. However, individual differences (e.g., preference for auditory or visual stimuli) influence cross-modal effects (van Atteveldt et al., 2014).

DIVIDED ATTENTION: DUAL-TASK PERFORMANCE In this section, we consider factors influencing how well we can perform two tasks at the same time. In our hectic 24/7 lives, we increasingly try to do two things at once (multi-tasking) (e.g., sending text messages while walking down the street). More specifically, multi-tasking “refers to the ability to co-ordinate the completion of several tasks to achieve an overall goal” (MacPherson, 2018, p. 314). It can involve performing two tasks at the same time or switching between two tasks. There is controversy as to whether massive amounts of multi-tasking have beneficial or detrimental effects on attention and cognitive control (see Box). What determines how well we can perform two tasks at once? Similarity (e.g., in terms of modality) is one important factor. Treisman and Davies (1973) found two monitoring tasks interfered with each other much more when the stimuli on both tasks were in the same modality (visual or auditory). Two tasks can also be similar in response modality. McLeod (1977) had participants perform a continuous tracking task with manual responding together with a tone-identification task. Some participants responded vocally to the tones whereas others responded with the hand not involved in tracking. Tracking performance was worse with high response similarity (manual responses on both tasks) than with low response similarity. Practice is the most important factor determining how well two tasks can be performed together. The saying “Practice makes perfect” was apparently supported by Spelke et al. (1976). Two students (Diane and John) received 5 hours of training a week for 4 months on various tasks. Their first task involved reading short stories for comprehension while writing down words from dictation, which they initially found very hard. After 6 weeks of training, however, they could read as rapidly and with as much comprehension when writing to dictation as when only reading. With further training, Diane and John learned to write down the names of the categories to which the dictated words belonged while maintaining normal reading speed and comprehension. Spelke et al.’s (1976) findings are hard to interpret for various reasons. First, Spelke et al. focused on accuracy measures, which are typically less sensitive to dual-task interference than speed measures. Second, Diane and John’s attentional focus was relatively uncontrolled, and so they may have alternated attention between tasks rather than attending to both at the same time. More controlled research on the effects of practice on dual-task performance is discussed later.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 212

28/02/20 6:44 PM



Attention and performance

213

IN THE REAL WORLD: MULTI-TASKING What are the effects of frequent multi-tasking in our everyday lives? Two main answers have been proposed. First, heavy multi-tasking may impair cognitive control because it leads individuals to allocate their attentional resources too widely. This is the scattered attention hypothesis (van der Schuur et al., 2015). Second, heavy multi-tasking may enhance some control processes (e.g., task switching) because of prolonged practice in processing multiple streams of information. This is the trained attention hypothesis (van der Schuur et al., 2015). The relevant evidence is very inconsistent – “positive, negative, and null effects have all been reported” (Uncapher & Wagner, 2018, p. 9894). Ophir et al. (2009) used a questionnaire (the Media Multitasking Index) to identify levels of multi-tasking. Heavy multi-taskers were more distractible. In a review, van der Schuur et al. (2015) found findings supported the scattered attention hypothesis (e.g., heavy multi-taskers had impaired sustained attention). Moisala et al. (2016) found heavy multi-taskers were more adversely affected than light ­multi-taskers by distracting stimuli while performing speech–listening and reading tasks. During distraction, the heavy multi-taskers had greater activity than the light multi-taskers in the right prefrontal cortex (associated with attentional control). This suggests heavy multi-taskers have greater problems than previously believed – their performance is impaired even though they try harder to exert top-down attentional control. Uncapher and Wagner (2018) found in a review that most research indicated negative effects of heavy multi-tasking on tasks involving working memory, long-term memory, sustained attention and relational reasoning. These negative effects are likely to be due to attentional lapses. Of relevance, there are several studies where media multi-tasking was positively associated with self-reported everyday attentional failures. In addition, heavy multi-taskers often report high impulsivity – such individuals often make rapid decisions based on very limited evidence. Most studies have only found an association between media multi-tasking and measures of attention and performance. This makes it hard to establish causality – it is possible individuals with certain patterns of attention choose to engage in extensive multi-tasking. Evidence suggesting that media multi-tasking can cause attention problems was reported by Baumgartner et al. (2018). They found that high media multi-tasking at one point in time predicted attention problems several months later.

Serial vs parallel processing When individuals perform two tasks together, they might use serial or parallel processing. Serial processing involves switching attention backwards and forwards between two tasks with only one task being processed at any given moment. In contrast, parallel processing involves processing both tasks at the same time. There has been much theoretical controversy on the issue of serial vs parallel processing in dual-task conditions (Koch et al., 2018). Of importance, processing can be mostly parallel or mostly serial. Lehle et al. (2009) trained participants to use serial or parallel processing when performing two tasks together. Those using serial processing performed better. However, they found the tasks more effortful because they had to inhibit processing of one task while performing the other one. Lehle and Hübner (2009) also instructed participants to perform two tasks together in a serial or parallel fashion. Those using parallel processing

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 213

28/02/20 6:44 PM

214

Visual perception and attention

performed much worse. Fischer and Plessow (2015) reviewed dual-task research and concluded: “While serial task processing appears to be the most efficient [dual-task] processing strategy, participants are able to adopt parallel processing. Moreover, parallel processing can even outperform serial processing under certain conditions” (p. 8). Brüning and Manzey (2018) confirmed serial processing is not always more efficient than parallel processing. Participants performed many alternate trials on two different tasks but could see the stimulus for the next trial ahead of time. Participants engaging in parallel processing (processing the stimulus for trial n+1 during trial n) performed better than those using only serial processing (not processing the trial n+1 stimulus ahead of time). Parallel processing reduced the costs incurred when task switching. Individuals high in working memory capacity (see Glossary) were more likely to use parallel processing, perhaps because of their superior attentional control.

IN THE REAL WORLD: CAN WE THINK AND DRIVE? Car driving is the riskiest activity engaged in by tens of millions of adults. Over 50 countries have laws restricting the use of mobile or cell phones by drivers to increase car safety. Are such restrictions necessary? The short answer is “Yes” – drivers using a mobile phone are several times more likely to be involved in a car accident (Nurullah, 2015). This is so even though drivers try to reduce the risks by driving slightly more slowly (reducing speed by 5–6 mph) than usual shortly after initiating a mobile-phone call (Farmer et al., 2015). Caird et al. (2008) in a review of studies using simulated driving tasks reported that reaction times to events (e.g., onset of brake lights on the car in front) increased by 250 ms with mobilephone use and were greater when drivers were talking rather than listening. This 250 ms increase in reaction time translates into travelling an extra 18 feet (5.5 metres) before stopping for a motorist doing 50 mph (80 kph). This could be the difference between stopping just short of a child or killing that child. Strayer and Drews (2007) studied the above slowing effect using event-related potentials while drivers responded rapidly to the onset of brake lights on the car in front. The magnitude of the P300 (a positive wave associated with attention) was reduced by 50% in mobile-phone users. Strayer et al. (2011) considered a real-life driving situation. Drivers were observed to see whether they obeyed a law requiring them to stop at a road junction. Of drivers not using a mobile phone, 79% obeyed the law compared to only 25% of mobile-phone users.

Theoretical considerations Why do so many drivers endanger people’s lives by using mobile phones? Most believe they can drive safely while using a mobile phone whereas other drivers cannot (Sanbonmatsu et al., 2016b). Their misplaced confidence depends on limited monitoring of their driving performance: drivers using a mobile phone make more driving errors but do not remember making more errors (Sanbonmatsu et al., 2016a). Why does mobile-phone use impair driving performance? Strayer and Fisher (2016) in their SPIDER model identified five cognitive processes that are adversely affected when drivers’ ­attention is diverted from driving (e.g., by mobile-phone use):

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 214

28/02/20 6:44 PM



Attention and performance

215

(1) There is less effective visual scanning of the environment for potential threats. Distracted drivers are more inclined to focus attention on the centre of the road and less inclined to scan objects in the periphery and their side mirrors (Strayer & Fisher, 2016). (2) The ability to predict where threats might occur is impaired. Distracted drivers are much less likely to make anticipatory glances towards the location of a potential hazard (e.g., obstructed view of a pedestrian crossing) (Taylor et al., 2015). (3) There is reduced ability to identify visible threats, a phenomenon known as inattentional blindness (see Glossary; and Chapter 4). In a study by Strayer and Drews (2007), 30 objects (e.g., pedestrians; advertising hoardings) were clearly visible to drivers. However, those using a mobile phone subsequently recognised far fewer objects they had fixated than those not using a mobile phone (under 25% vs 50%, respectively). (4) It is harder to decide what action is necessary in a threatening situation. Cooper et al. (2009) found drivers were 11% more likely to make unsafe lane changes when using a mobile phone. (5) It becomes harder to execute the appropriate action. Reaction times are slowed (Caird et al., 2008, discussed above, p. 214). The SPIDER model is oversimplified in several ways. First, various different activities are associated with mobile-phone use. Simmons et al. (2016) found in a meta-analytic review that the risk of safety-­ critical events was increased by activities requiring drivers to take their eyes off the road (e.g., locating a phone; dialling; texting). However, talking on a mobile phone did not increase risk. Second, driving-irrelevant cognitive activities do not always impair all aspects of driving performance. Engstrom et al. (2017, p. 734) proposed their cognitive control hypothesis: “Cognitive load selectively impairs driving sub-tasks that rely on cognitive control but leaves automatic performance unaffected.” For example, driving-irrelevant activities involving cognitive load (e.g., mobilephone use) typically have no adverse effect on well-practised driving skills, such as lane keeping and braking when getting close to the vehicle in front (Engstrom et al., 2017). Third, individuals using mobile phones while driving are unrepresentative of drivers in general (e.g., they tend to be relatively young and to engage in more risk-taking activities: Precht et al., 2017). Thus, we must consider individual differences in personality and risk taking when interpreting accidents associated with mobile-phone use. Fourth, the SPIDER model implies that performance cannot be improved by adding a secondary task. However, driving performance in monotonous conditions is sometimes better when drivers listen to the radio at the same time (see Engstrom et al., 2017). Listening to the radio can reduce the mind-wandering that occurs when someone drives in monotonous conditions. Drivers indicating their immediate thoughts during their daily commute reported mind-wandering 63% of the time and active focus on driving only 15%–20% of the time (Burdett et al., 2018).

Multiple resource theory Wickens (1984, 2008) argued in his multiple resource model that the processing system consists of several independent processing resources or mechanisms. The model includes four major dimensions (see Figure 5.16): (1) Processing stages: there are successive stages of perception, cognition (e.g., working memory) and responding. (2) Processing codes: perception, cognition and responding can use spatial and/or verbal codes; action can involve speech (vocal verbal) or manual/spatial responses.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 215

28/02/20 6:44 PM

216

Visual perception and attention

Figure 5.16 Wickens’s four-dimensional multiple resource model. The details are described in the text. From Wickens (2008). © 2008. Reprinted by permission of SAGE Publications.

(3) Modalities: perception can involve visual and/or auditory resources. (4) Visual channels: visual processing can be focal (high acuity) or ambient (peripheral). Here is the model’s crucial prediction: “To the extent that two tasks use different levels along each of the three dimensions [excluding (4) above], time-­ sharing [dual-task performance] will be better” (Wickens, 2008, p. 450). Thus, tasks requiring different resources can be performed together more successfully than those requiring the same resources. Wickens’s approach bears some resemblance to Baddeley’s (e.g., 2012) working memory model (see Chapter 6). According to that model, two tasks can be performed together successfully provided they use different components or processing resources.

Findings Research discussed earlier (Treisman & Davies, 1973; McLeod, 1977) showing the negative effects of stimulus and response similarity on performance are entirely consistent with the theory. Lu et al. (2013) reviewed research where an ongoing visual-motor task (e.g., car driving) was ­performed together with an interrupting task in the visual, auditory or tactile (touch) modality. As predicted, non-visual interrupting tasks (especially those in the tactile modality) were processed more effectively than visual ones and there were no adverse effects on the visual-motor task. According to the model, there should be only limited dual-task interference between two visual tasks if one requires focal or foveal vision, whereas the other requires ambient or peripheral vision. Tsang and Chan (2018) obtained support for this prediction in a study in which participants tracked a moving target in focal vision while responding to a spatial task in ambient or peripheral vision. Dual-task performance is often more impaired than predicted by the theory. For example, consider a study by Robbins et al. (1996; see

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 216

28/02/20 6:44 PM



Attention and performance

217

Chapter 6). The main task was selecting chess moves and we will focus on the condition where the task performed at the same time was generating random letters. These two tasks involve different processing codes (spatial vs verbal, respectively) and they also involve different response types (manual vs vocal, respectively). Nevertheless, generating random letters caused substantial interference on the chess task.

Evaluation The main assumptions of the theory have largely been supported by the experimental evidence. In other words, dual-task performance is generally less impaired when two tasks differ with respect to modalities, processing codes or visual channels than when they do not. What are the model’s limitations? (1) Successful dual-task performance often requires higher-level processes of coordinating and organising the demands of the two tasks (see later section on cognitive neuroscience, pp. 220–222). However, these processes are de-emphasised within the theory. (2) The theory’s assumption there is a sequence of processing stages (perception; cognition; responding) is too rigid given the flexible nature of much dual-task processing (Koch et al., 2018). The numerous forms of cognitive processing intervening between perception and responding are not discussed in detail. (3) It is implied within the theory that negative or interfering effects of performing two tasks together would be constantly present. However, Steinborn and Huestegge (2017) found dual-task conditions led only to occasional performance breakdown due to attention failures.

Threaded cognition Salvucci and Taatgen (2008, 2011) proposed a model of threaded cognition in which streams of thought are represented as threads of processing. For example, processing two tasks might involve two separate threads. The central theoretical assumptions are as follows: Multiple threads or goals can be active at the same time, and as long as there is no overlap in the cognitive resources needed by these threads, there is no multi-tasking interference. When threads require the same resource at the same time, one thread must wait and its performance will be adversely affected. (Salvucci & Taatgen, 2011, p. 228) This is because all resources have limited capacity. Taatgen (2011) discussed the threaded cognition model (see Figure  5.17). Several cognitive resources can be the source of competition between two tasks. These include visual perception, declarative memory, task control and focal working memory or problem state. Nijboer et al. (2016a) discussed similarities between this model and Baddeley’s working memory model (see Chapter 6). Three components of the model relate to

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 217

28/02/20 6:44 PM

218

Visual perception and attention

Figure 5.17 Threaded cognition theory. We possess several cognitive resources (e.g., declarative memory, task control, visual perception). These resources can be used in parallel but each resource can only work on one task at a time. Our ability to perform two tasks at the same time (e.g., driving and dialling, subtraction and typing) depends on the precise ways in which cognitive resources need to be used. The theory also identifies some of the brain areas associated with cognitive resources.

working memory: (1) problem state (attentional focus); (2) declarative memory (activated short-term memory); and (3) subvocal rehearsal (resembling the phonological loop; see Chapter 6). Each thread or task controls resources in a greedy, polite way – threads claim resources greedily when required but release them politely when no longer needed. These aspects of the model lead to one of its most original assumptions – several goals (each associated with a given thread) can be active simultaneously. The model resembles Wickens’s multiple resource model: both models assume there are several independent processing resources. However, only the threaded cognition model led to a computational model making specific predictions. In addition, the threaded cognition model identifies the brain areas associated with each processing resource (see Figure 5.17).

Findings

According to the model, any given cognitive resource (e.g., visual perception; focal From Taatgen (2011). With permission of the author. working memory) can be used by only one process at any given time. Nijboer et al. (2013) tested this assumption using multi-column subtraction as the primary task with participants responding using a keypad. Easy and hard conditions differed in whether digits were carried over (“borrowed”) from one column to the next: (1: easy) (2: hard)  336789495  3649772514 –224578381 –1852983463 The model predicts focal working memory is required only in the hard condition. Subtraction was combined with a secondary task: a tracking task involving visual and manual resources or a tone-counting task involving working memory. Nijboer et al. (2013) predicted performance on the easy subtraction task should be worse when combined with the tracking task because both compete for visual and manual resources. In contrast, performance on the hard subtraction task should be worse when combined with the tone-­ counting task because there are large disruptive effects when two tasks compete for working memory resources. The findings were as predicted. Borst et al. (2013) found there was far less impairment of hard subtraction performance by a secondary task requiring working memory when

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 218

28/02/20 6:44 PM



Attention and performance

219

participants saw a visual sign explicitly indicating that “borrowing” was needed. This supports the model’s assumption that dual-task performance can be enhanced by appropriate environmental support. According to the threaded cognition model, we often cope with the demands of combining two tasks by switching flexibly between them to maximise performance. Support was reported by Farmer et al. (2018). Participants performed a typing task and a tracking task at the same time. The relative value of the two tasks was varied by manipulating the number of points lost for poor tracking performance. Participants rapidly learned to adjust their strategies over time to increase the overall number of points they gained. Katidioti and Taatgen (2014) found task switching is not always optimal. Participants performed two tasks together: (1) an email task in which information needed to be looked up; (2) chat messages containing questions to be answered. When there was a delay on the email task, most participants switched to the chat task. This happened even when this was suboptimal because it caused participants to forget information in the email task. How can we explain the above findings? According to Katidioti and Taatgen (2014, p. 734), “The results . . . agree with threaded cognition’s ‘greedy’ theory . . . which states that people will switch to a task that is waiting as soon as the resources for it are available.” Huijser et al. (2018) obtained further evidence of “greediness”. When there were brief blank periods during the performance of a cognitively demanding task, participants often had task-irrelevant thoughts (e.g., mind-wandering) even though these thoughts impaired task performance. Katidioti and Taatgen (2014) also discovered substantial individual differences in task switching – some participants never switched to the chat task when delays occurred on the email task. Such individual differences cannot be explained by the theory. As mentioned earlier, a recent version of threaded cognition theory discussed by Nijboer et al. (2016a) identifies three components of working (i.e., problem state or focus of attention; declarative memory or activated short-term memory; and subvocal rehearsal). Nijboer et al. had participants perform two working memory tasks at the same time; these tasks varied in the extent to which they required the same working memory components. They obtained measures of performance and also used neuroimaging under dual-task and single-task conditions. What did Nijboer et al. (2016a) find? First, dual-task interference could be predicted from the extent to which the two tasks involved the same working memory components. Second, dual-task interference could also be predicted from the extent of overlap in brain activation of the two tasks in single-task conditions. In sum, dual-task interference depended on competition for specific resources (i.e., working memory components) rather than general resources (e.g., central executive).

Evaluation The model has proved successful in various ways. First, several important cognitive resources have been identified. Second, the model identifies brain

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 219

28/02/20 6:44 PM

220

Visual perception and attention

areas associated with various cognitive resources. This has led to computational modelling testing the model’s predictions using neuroimaging and behavioural findings. Thus, the model accounts for dual-task performance without assuming the existence of a central executive or other executive process (often vaguely defined in other theories). Fourth, the theory predicts factors determining switching between two tasks being performed together. Fifth, individuals often have fewer problems performing two simultaneous tasks than generally assumed. What are the model’s limitations? First, it predicts that “Practising two tasks concurrently [together] results in the same performance as performing the two tasks independently” (Salvucci & Taatgen, 2008, p. 127). This de-emphasises the importance of processes coordinating and managing two tasks performed together (see next section). Second, excluding processes resembling Baddeley’s central executive is controversial and may well prove inadvisable. Third, most tests of the model have involved the simultaneous performance of two relatively simple tasks and its applicability to more complex tasks remains unclear. Fourth, the theory does not provide a full explanation for individual differences in the extent of task switching (e.g., Katidioti & Taatgen, 2014).

Cognitive neuroscience The cognitive neuroscience approach is increasingly used to test theoretical models and enhance our understanding of processes underlying dual-task performance. Its value is that neuroimaging provides “an additional data source for contrasting between alternative models” (Palmeri et al., 2017, p.  61). More generally, behavioural findings indicate the extent to which dual-task conditions impair task performance but are often relatively uninformative about the precise reasons for such impairment. Suppose we compare patterns of brain activation while participants perform tasks x and y singly or together. Three basic patterns are shown in Figure 5.18: (1) Underadditive activation: reduced activation in one or more brain areas in the dual-task condition occurs because of resource competition between the tasks.

Figure 5.18 (a) Underadditive activation; (b) additive activation; (c) overadditive activation. White indicates task 1 activation; grey indicates task 2 activation; and black indicates activation only present in dual-task conditions.

(a) Underadditive activation

(b) Additive activation

Component task 1

Component task 1

Component task 2

Dual-task

From Nijboer et al., 2014. Reprinted with permission of Elsevier.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 220

Time

Time

Time

Component task 2

Dual-task

(c) Overadditive activation Component task 1

Time

Time

Time

Component task 2

Dual-task

Time

Time

Time

28/02/20 6:44 PM



221

Attention and performance

(2) Additive activation: brain activation in the dual-task condition is simply the sum of the two single-task activations because access to resources is integrated efficiently between the two tasks. (3) Overadditive activation: brain activation in one or more brain areas is present in the dual-task condition but not the single-task conditions. This occurs when dual-task conditions require executive processes that are absent (or less important) with single tasks. These executive processes include the coordination of task demands, attentional control and dual-task management generally. We would expect such executive processes to be associated mostly with activation in prefrontal cortex.

KEY TERM Underadditivity The finding that brain activation when tasks A and B are performed at the same time is less than the sum of the brain activation when tasks A and B are performed separately.

Findings We start with an example of underadditive activation. Just et al. (2001) used two very different tasks (auditory sentence comprehension and mental rotation of 3-D figures) performed together or singly. Performance on both tasks was impaired under dual-task conditions compared to single-task conditions. Under dual-task conditions, brain activation in language processing areas decreased by 53% and reduced by 29% in areas associated with mental rotation. These findings suggest fewer task-relevant processing resources were available when both tasks were performed together. Schweizer et al. (2013) also reported underadditivity. Participants performed a driving task on its own or with a distracting secondary task (answering spoken questions). Driving performance was unaffected by the secondary task. However, driving with distraction reduced activation in posterior brain areas associated with spatial and visual processing (underadditivity). It also produced increased activation in the prefrontal cortex (overadditivity; see Figure 5.19) probably because driving with distraction requires increased attentional or cognitive control within the prefrontal cortex. Dual-task performance is often associated with overadditivity due to increased activity within the prefrontal cortex (especially the lateral prefrontal cortex) during dual-task performance (see Strobach et al., 2018, for a review). However, most such findings do not show that this increased prefrontal activation is actually required for dual-task performance. More direct evidence that prefrontal areas associated with attentional or cognitive control are causally involved in enhancing dual-task performance was reported by Filmer et al. (2017) and Strobach et al. (2018). Filmer et al. (2017) studied the effects of transcranial direct current stimulation (tDCS; see Glossary) applied to areas of the prefrontal cortex associated with cognitive control. Anodal tDCS during training enhanced cognitive control and subsequent dual-task performance.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 221

Figure 5.19 Effects of an audio distraction task on brain activity associated with a straight driving task. There were significant increases in activation within the ventrolateral prefrontal cortex and the auditory cortex (in orange). There was decreased activation in occipital-visual areas (in blue). From Schweizer et al. (2013).

28/02/20 6:44 PM

222

Visual perception and attention

Strobach et al. (2018) reported similar findings. Anodal tDCS applied to the lateral prefrontal cortex led to enhanced dual-task performance. In another condition, cathodal tDCS to the same area of the prefrontal cortex impaired dual-task performance. These findings were as predicted given that anodal and cathodal tDCS often have opposite effects on performance. These findings indicate that the lateral prefrontal cortex causally influences dual-task performance. Additional evidence of the importance of the lateral prefrontal cortex was reported by Wen et al. (2018). Individuals with high connectivity (connectedness) within that brain area showed superior dual-task performance to those with low connectivity. Finally, patterns of brain activation can help to explain practice effects on dual-task performance. Garner and Dux (2015) found much fronto-­ parietal activation (associated with cognitive control) when two tasks were performed singly or together. Extensive training greatly reduced dual-task interference and also produced increasing differentiation in the pattern of fronto-parietal activation associated with the two tasks. Participants showing the greatest reduction in dual-task interference tended to have the greatest increase in differentiation. Thus, using practice to increase differences in processing and associated brain processing between tasks can be very effective.

Evaluation Brain activity in dual-task conditions often differs from the sum of brain activity of the same two tasks performed singly. Dual-task activity can exhibit underadditivity or overadditivity. The findings are theoretically important because they indicate performance of dual tasks can involve much more cognitive control and other processes than single tasks. Garner and Dux’s (2015) findings demonstrate that enhanced dual-task performance with practice can depend on increased differentiation between the two tasks with respect to processing and brain activation. What are the limitations of the cognitive neuroscience approach? First, increased (or decreased) activity in a given brain area in dual-task conditions is not necessarily very informative. For example, Dux et al. (2009) found dual-task performance improved over time because practice increased the speed of information processing in the prefrontal cortex rather than because it changed activation within that region. Second, it is often unclear whether patterns of brain activation are directly relevant to task processing rather than reflecting non-task processing. Third, findings in this area are rather inconsistent (Strobach et al., 2018) and we lack a comprehensive theory to account for these inconsistencies. Plausible reasons for these apparent inconsistencies are the great variety of task combinations used in dual-task studies and individual differences in task proficiency among participants (Watanabe & Funahashi, 2018).

Psychological refractory period: cognitive bottleneck? Much of the research discussed so far was limited because the task combinations used made it hard to assess in detail the processes used by participants. For example, the data collected were often insufficient to indicate

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 222

28/02/20 6:44 PM



223

Attention and performance

the frequency with which participants switched their attentional focus from one task to the other. This led researchers to use much simpler tasks so that they had “better experimental control over the timing of the component task processes” (Koch et al., 2018, p. 575). The dominant paradigm in recent research is as follows. There are two stimuli (e.g., two lights) and two responses (e.g., button presses), one associated with each stimulus. Participants respond to each stimulus as rapidly as possible. When the two stimuli are presented at the same time (dual-task condition), performance is typically worse on both tasks than when each task is presented separately (single-task conditions). When the second stimulus is presented shortly after the first, there is typically a marked slowing of the response to the second stimulus: the ­psychological refractory period (PRP) effect. This effect is robust – Ruthruff et al. (2009) obtained a large PRP effect even when participants were given strong incentives to eliminate it. The PRP effect has direct real-world relevance. Hibberd et al. (2013) studied the effects of a simple in-vehicle task on braking performance when the vehicle in front braked and slowed down. There was a classic PRP effect – braking time was slowest when the in-vehicle task was presented immediately before the vehicle in front braked. How can we explain the PRP effect? It is often argued task performance involves three successive stages: (1) perceptual; (2) central response selection; and (3) response execution. According to the bottleneck model (e.g., Pashler, 1994),

KEY TERMS Psychological refractory period (PRP) effect The slowing of the response to the second of two stimuli when presented close together in time. Stimulus onset ­asynchrony (SOA) Time interval between the start of two stimuli.

The response selection stage of the second task cannot begin until the response selection stage of the first task has finished, although the other stages . . . can proceed in parallel . . . according to this model, the PRP effect is a consequence of the waiting time of the second task because of a bottleneck at the response selection stage. (Mittelstädt & Miller, 2017, p. 89) The bottleneck model explains several findings. For example, consider the effects of varying the time between the start of the first and second stimuli (stimulus onset asynchrony (SOA)). According to the model, processing on the first task should slow down second-task processing much more when the SOA is small than when it is larger. The predicted finding is generally obtained (Mittelstädt & Miller, 2017). The bottleneck model remains the most influential explanation of the PRP effect (and other dual-task costs). However, resource models (e.g., Navon & Miller, 2002) are also influential. According to these models, limited processing capacity can be shared between two tasks so both are processed simultaneously. Of crucial importance, sharing is possible even during the response selection process. A consequence of sharing processing capacity across task is that each task is processed more slowly than if performed on its own. Many findings can be explained by both models. However, resource models are more flexible than bottleneck models. Why is this? Resource models assume the division of processing resources between two tasks varies freely to promote efficient performance.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 223

28/02/20 6:44 PM

224

KEY TERM Crosstalk In dual-task conditions, the direct interference between the tasks that is sometimes found.

Visual perception and attention

Another factor influencing the PRP effect is crosstalk (the two tasks interfere directly with each other). This mostly occurs when the stimuli and/or responses on the two tasks are similar. A classic example of crosstalk is when you try to rub your stomach in circles with one hand while patting your head with the other hand (try it!). Finally, note that participants in most studies receive only modest amounts of practice in performing two tasks at the same time. As a consequence, the PRP effect may occur at least in part because participants receive insufficient practice to eliminate it.

Findings According to the bottleneck model, we would expect to find a PRP effect even when easy tasks are used and/or participants receive prolonged practice. Contrary evidence was reported by Schumacher et al. (2001). They used two tasks: (1) say “one”, “two” or “three” to low-, medium- and high-pitched tones, respectively; (2) press response keys corresponding to the position of a disc on a computer screen. These tasks were performed together for over 2,000 trials, by which time some participants performed them as well together as singly. Strobach et al. (2013) conducted a study very similar to that of Schumacher et al. (2001). Participants took part in over 5,000 trials involving single-task or dual-task conditions. However, dual-task costs were not eliminated after extensive practice: dual-task costs on the auditory task reduced from 185 to 60 ms and those on the visual task from 83 to 20 ms (see Figure 5.20). How did dual-task practice benefit performance? Practice speeded up the central response selection stage in both tasks. Why did the findings differ in the two studies discussed above? In both studies, participants were rewarded for fast responding on single-task and dual-task trials. However, the way the reward system was set up in the Schumacher et al. study may have led participants to exert more effort in dual-task than single-task trials. This potential bias was absent from the

Figure 5.20 Reaction times for correct responses only over eight experimental sessions under dual-task (auditory and visual tasks) and singletask (auditory or visual task) conditions. From Strobach et al. (2013). Reprinted with permission of Springer.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 224

28/02/20 6:44 PM



225

Attention and performance

Strobach et al. study. This difference in reward structure could explain the greater dual-task costs in the Strobach et al. study. Hesselmann et al. (2011) studied the PRP effect using event-related potentials. The slowing of responses on the second task was closely matched by slowing in the onset of the P300 (an ERP component reflecting response selection). However, there was no slowing of earlier ERP components reflecting perceptual processing. Thus, as predicted by the bottleneck model, the PRP effect depended on response selection rather than perceptual processes. According to the resource model approach, individuals choose whether to use serial or parallel processing on PRP tasks. Miller et al. (2009) argued that serial processing generally leads to superior performance compared with parallel processing. However, parallel processing should theoretically be superior when the stimuli associated with the two tasks are mostly presented close in time. As predicted, there was a shift from predominantly serial processing towards parallel processing when that was the case. Miller et al. (2009) used very simple tasks and it is likely parallel processing is most likely to be used with such tasks. Han and Marois (2013) used two tasks, one of which was relatively difficult. Participants used serial processing even when parallel processing was encouraged by financial rewards. Finally, we consider the theoretically important backward crosstalk effect: “characteristics of Task 2 of 2 subsequently performed tasks influence Task 1 performance” (Janczyk et al., 2018, p. 261). Hommel (1998) obtained this effect. Participants responded to Task 1 by making a left or right key-press and to Task 2 by saying “left” or “right”. Task 1 responses were faster when the two responses were compatible (e.g., press right key + say “right”) than when they were incompatible (e.g., press right key + say “left”). Evidence for the backward crosstalk effect was also reported by Janczyk et al. (2018). Why is the backward crosstalk effect theoretically important? It indicates that aspects of response selection processing on Task 2 occur before response selection processing on Task 1 has finished. This effect is incompatible with the bottleneck model, which assumes response selection on Task 1 is completed prior to any response selection on Task 2. In other words, this model assumes there is serial processing at the response selection stage. In contrast, the backward crosstalk effect is compatible with the resource model approach.

KEY TERM Backward crosstalk effect Aspects of Task 2 influence response selection and performance speed on Task 1 in studies on the psychological refractory period (PRP) effect.

Summary and conclusions The findings from most research on the psychological refractory period effect are consistent with the bottleneck model. As predicted, this effect is typically larger when the second task follows very soon after the first task. In addition, even prolonged practice rarely eliminates the psychological refractory period effect suggesting that central response selection processes typically occur serially. The bottleneck model assumes processing is less flexible than is often the case. For example, the existence of the backward crosstalk effect is inconsistent with the bottleneck model but consistent with the resource

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 225

28/02/20 6:44 PM

226

Visual perception and attention

model approach. Fischer et al. (2018) also found evidence for much flexibility. There was less interference between the two tasks when financial rewards were offered because participants devoted more processing resources to protecting the first task from interference. However, the resource model approach has the disadvantage compared to the bottleneck model that its predictions are less precise, making it harder to submit to detailed empirical testing. Finally, as Koch et al. (2017, p. 575) pointed out, the bottleneck model “can be applied (with huge success) mainly for conditions in which two tasks are performed strictly sequentially”. This is often the case with research on the psychological refractory period effect but is much less applicable to more complex dual-task situations.

“AUTOMATIC” PROCESSING We have seen in studies of divided attention that practice often causes a dramatic improvement in performance. This improvement has been explained by assuming some processes become automatic through prolonged practice. For example, the huge amount of practice we have had with reading words has led to the assumption that familiar words are read “automatically”. Below we consider various definitions of “automaticity”. We also consider different approaches to explaining the development of automatic processing.

Traditional approach: Shiffrin and Schneider (1977) Shiffrin and Schneider (1977) and Schneider and Shiffrin (1977) distinguished between controlled and automatic processes: ●●

●●

Controlled processes are of limited capacity, require attention and can be used flexibly in changing circumstances. Automatic processes suffer no capacity limitations, do not require attention and are very hard to modify once learned.

In Schneider and Shiffrin’s (1977) research, participants memorised letters (the memory set) followed by a visual display containing letters. They then decided rapidly whether any item in the visual display was the same as any item in the memory set. The crucial manipulation was the type of mapping. With consistent mapping, only consonants were used as members of the memory set and only numbers were used as distractors in the visual display (or vice versa). Thus, a participant given only consonants to memorise would know any consonant detected in the visual display was in the memory set. With varied mapping, numbers and consonants were both used to form the memory set and to provide distractors in the visual display. The mapping manipulation had dramatic effects (see Figure 5.21). The numbers of items in the memory set and visual display greatly affected decision speed only with varied mapping. According to Schneider and Shiffrin (1977), varied mapping involved serial comparisons between each item in the memory set and each item in the visual display until a match was achieved or every comparison had been made. In contrast, consistent

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 226

28/02/20 6:44 PM



227

Attention and performance

Figure 5.21 Response times on a decision task as a function of memory-set size, displayset size and consistent vs varied mapping. Response times on a decision task as a function of memory-set size, displayset size and consistent vs varied mapping. Data from Shiffrin and Schneider (1977). American Psychological Association.

mapping involved automatic processes operating independently and in parallel. These automatic processes have evolved through prolonged practice in distinguishing between letters and numbers. In a second experiment, Shiffrin and Schneider (1977) used consistent mapping with the consonants B to L forming one set and Q to Z the other set. As before, items from only one set always formed the memory set with all the distractors in the visual display being selected from the other set. Performance improved greatly over 2,100 trials, reflecting increased automaticity. After that, there were 2,100 trials with the reverse consistent mapping (swapping over the memory and visual display sets). With this reversal, it took nearly 1,000 trials before performance recovered to its level at the start of the experiment! Evidence that there may be limited (or no conscious awareness in the consistent mapping condition was reported by Jansma et al. (2001)). Increasing automaticity (indexed by increased performance speed) was accompanied by reduced activation in areas associated with conscious awareness (e.g., dorsolateral prefrontal cortex). In sum, automatic processes function rapidly and in parallel but are inflexible (second part of the second experiment). Controlled processes are flexible and versatile but operate relatively slowly and in a serial fashion.

Limitations What are the limitations with this approach? First, the distinction between automatic and controlled processes is oversimplified (discussed below). Second, Shiffrin and Schneider (1977) argued automatic processes operate in parallel and place no demands on attentional capacity and so decision speed should be unrelated to the number of items. However, decision

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 227

28/02/20 6:44 PM

228

Visual perception and attention

speed was slower when the memory set and visual display both contained several items (see Figure 5.21). Third, the theory is descriptive rather than ­explanatory – it does not explain how serial controlled processing turns into parallel automatic processing.

Definitions of automaticity Shiffrin and Schneider (1977) assumed there is a clear-cut distinction between automatic and controlled processes. More specifically, automatic processes possess several features (e.g., inflexibility; very efficient because they have no capacity limitations; occurring in the absence of attention). In essence, it is assumed there is perfect coherence or consistency among the features (i.e., they are all found together). Moors and De Houwer (2006) and Moors (2016) identified four key features associated with automaticity: (1) unconscious: lack of conscious awareness of at least one of the following: “the input, the output, and the transition from one to the other” (Moors, 2016, p. 265); (2) efficient: using very little attentional capacity; (3) fast; (4) goal-unrelated or goal-uncontrolled: at least one of the following is missing: “the goal is absent, the desired state does not occur, or the causal relation [between the goal and the occurrence of the desired state] is absent” (Moors, 2016, p. 265). Why might these four features (or the similar ones identified by Shiffrin and Schneider (1977)) often be found together? Instance theory (Logan, 1988; Logan et al., 1999) provides an influential answer. It is assumed task practice leads to storage of information in long-term memory which facilitates subsequent performance on that task. In essence, “Automaticity is memory retrieval: performance is automatic when it is based on a ­single-step direct-access retrieval of past solutions from memory” (Logan, 1988, p.  493). For example, if you were given the problem “24 × 7 = ???” numerous times, you would retrieve the answer (168) “automatically” without performing any mathematical calculations. Instance theory makes coherent sense of several characteristics of automaticity. Automatic processes are fast because they require only the retrieval of past solutions from long-term memory. They make few demands on attentional resources because the retrieval of heavily o ­ver-learned ­information is relatively effortless. Finally, there is no conscious awareness of automatic processes because no significant processes intervene between stimulus presentation and retrieval of the correct response. In spite of its strengths, instance theory is limited (see Moors, 2016). First, the theory implies the key features of automaticity will typically all be found together. However, this is not the case (see below). Second, it is assumed practice leads to automatic retrieval of solutions with learners having no control over such retrieval. However, Wilkins and Rawson (2011) found evidence learners can exercise top-down control over retrieval: when the instructions emphasised accuracy, there was less evidence of retrieval

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 228

28/02/20 6:44 PM



229

Attention and performance

than when they emphasised speed. Thus, the use of retrieval after practice is not fully automatic. Melnikoff and Bargh (2018) argued that the central problem with the traditional approach is that no research has shown the four features associated with “automaticity” occurring together. As they pointed out, “No attempt has been made to estimate the probability of a process being intentional given that is conscious versus unconscious, or the probability of a process being controllable given that it is efficient versus inefficient, and so forth” (p. 282).

Decompositional approach: Moors (2016) Moors and De Houwer (2006) and Moors (2016) argued that previous theoretical approaches are greatly oversimplified. Instead, they favoured a decompositional approach. According to this approach, the features of automaticity are clearly separable and are by no means always found together: “It is dangerous to draw inferences about the presence or absence of one feature on the basis of the presence or absence of another” (Moors & de Houwer, 2006, p. 320). Moors and De Houwer (2006) also argued there is no firm dividing line between automaticity and non-automaticity. The features are continuous rather than all-or-none (e.g., a process can be fairly fast or slow; it can be partially conscious). As a result, most processes involve a blend of automaticity and non-automaticity. This approach is rather imprecise because few processes are 100% automatic or non-automatic. However, we can make relative statements (e.g., process X is more/less automatic than process Y). Moors (2016) claimed the relationships between factors such as goals, attention and consciousness are much more complex than claimed within traditional approaches to “automaticity”. This led her to develop a new theoretical account (see Figure 5.22). A key assumption is that all information

Prior stimulus factors • Frequency • Recency • Stimulus quality: duration, intensity

Prior stimulus representation factors • Existence of stimulus representation in LTM • Strength of trace to stimulus representation in LTM ~ Availability of stimulus representation in LTM • Quality of stimulus representation in WM

Prior stimulus × person factors • Selection history • Reward history

Conscious processing

Attention

2nd threshold Current stimulus factors • Stimulus quality: duration, intensity • Un/expectedness • Goal in/congruence • Novelty/familiarity

Attention

Current stimulus representation factors • Quality of stimulus representation: duration, intensity, distinctiveness ~ Accessibility of stimulus representation for processing

Unconscious processing 1st threshold

Figure 5.22 Factors that are hypothesised to influence representational quality within Moors’ (2016) theoretical approach. From Moors (2016).

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 229

28/02/20 6:44 PM

230

Visual perception and attention

processes require an input of sufficient representational quality (defined by the “intensity, duration, and distinctiveness of a representation”, Moors, 2016, p. 273). What factors determine representational quality? (1) current stimulus factors, including the extent to which a stimulus is expected or unexpected, familiar or novel, and goal congruent or incongruent; (2) prior stimulus factors (e.g., the frequency and recency with which the current stimulus has been encountered); (3) prior stimulus representation factors based on relevant information stored within long-term memory; (4) attention, which enhances or amplifies the impact of current stimulus factors and prior stimulus representation factors on the current stimulus representation. According to this theoretical account, the above factors influence repre­ sentational quality additively so that a high level of one factor can compensate for a low level of another factor. For example, selective attention or relevant information in long-term memory can compensate for brief stimulus presentations. The main impact of consciousness occurs later than for other factors (e.g., attention and goal congruence). More specifically, representational quality must reach the first threshold to permit unconscious processing but a more stringent second threshold to permit conscious processing.

Findings According to Moors’ (2016) theoretical framework, there is a flexible relationship between controlled and conscious processing. This contrasts with Schneider and Shiffrin’s (1977) assumption that executive control is always associated with conscious processing. Diao et al. (2016) reported findings consistent with Moors’ prediction. They used a Go/No-Go task where participants made a simple response (Go trials) or withheld it (No-Go trials). High-value or low-value financial rewards were available for successful performance. Task stimuli were presented above or below the level of conscious awareness. What did Diao et al. (2016) find? Performance was better on high-­ reward than low-reward trials even when task processing was unconscious. In addition, participants showed superior unconscious inhibitory control (assessed by event-related potentials) on high-reward trials. Thus, one feature of automaticity (unconscious processing) was present whereas another feature (goal-uncontrolled) was not. Huber-Huber and Ansorge (2018) also reported problems for the traditional approach. Participants received target words indicating an upward or downward direction (e.g., above; below). Prior to the target word, a prime word also indicating an upward or downward direction was presented below the level of conscious awareness. Response times to the target words were slower when there was a conflict between the meanings of the prime and target words than when they were congruent in meaning. As in

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 230

28/02/20 6:44 PM



231

Attention and performance

the study by Diao et al. (2016), unconscious processing was combined with control, a combination that is inconsistent with the traditional approach.

Evaluation The theoretical approach to automaticity proposed by Moors (2016) has several strengths. First, the assumption that various features associated with automaticity often correlate poorly with each other is clearly superior to the earlier notion that these features exhibit perfect coherence. Second, her assumption that processes vary in the extent to which they are “automatic” is much more realistic than the simplistic division of processes into automatic and non-automatic. Third, the approach is more comprehensive than previous ones because it considers more factors relevant to “automaticity”. What are the limitations with Moors’ (2016) approach? First, numerous factors are assumed to influence representational quality (and thus the extent to which processes are automatic) (see Figure 5.22). It would thus require large-scale experimental research to assess the ways all these factors interact. Second, the approach provides only a partial explanation of the underlying mechanisms causing the various factors to influence representational quality.

Interactive exercise: Definitions of attention

CHAPTER SUMMARY •

Focused auditory attention. When two auditory messages are presented at the same time, there is less processing of the unattended than the attended message. Nevertheless, unattended messages often receive some semantic processing. The restricted processing of unattended messages may reflect a bottleneck at various stages of processing. However, theories assuming the existence of a bottleneck de-emphasise the flexibility of selective auditory attention. Attending to one voice among several (the cocktail party problem) is a challenging task. Human listeners use top-down and bottom-up processes to select one voice. Topdown processes include the use of various control processes (e.g., focused attention; inhibitory processes) and learning about structural consistencies present in the to-be-attended voice.



Focused visual attention. Visual attention can resemble a spotlight or zoom lens. In addition, the phenomenon of split attention suggests visual attention can also resemble multiple spotlights. However, accounts based on spotlights or a zoom lens typically fail to specify the underlying mechanisms. Visual attention can be object-based, space-based or feature-based, and it is often object-based and space-based at the same time. Visual attention is flexible and is influenced by factors such as individual differences. According to Lavie’s load theory, we are more susceptible to distraction when our current task involves low perceptual load and/or high cognitive load. There is much support for this theory.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 231

28/02/20 6:44 PM

232

Visual perception and attention

However, the effects of perceptual and cognitive load are often not independent as predicted. In addition, it is hard to test the theory because the terms “perceptual load” and “cognitive load” are vague. There are stimulus-driven ventral attention and goaldirected dorsal attention networks involving different (but partially overlapping) brain networks. More research is required to establish how these two attentional systems interact. Additional brain networks (e.g., cingulo-opercular network; default mode network) relevant to attention have also been identified. •

Disorders of visual attention. Neglect occurs when damage to the ventral attention network in the right hemisphere impairs the functioning of the undamaged dorsal attention network. This impaired functioning of the dorsal attention network involves reduced activation and alertness within the left hemisphere. Extinction is due to biased competition for attention between the two hemispheres combined with reduced attentional capacity. More research is required to clarify differences among neglect patients in their specific processing deficits (e.g., the extent to which failures to detect left-field stimuli are due to impaired spatial working memory).



Visual search. One problem with airport security checks is that there are numerous possible target objects. Another problem is the rarity of targets, which leads to excessive caution in reporting targets. According to feature integration theory, object features are processed in parallel and then combined by focused attention in visual search. This theory ignores our use of general scene knowledge in everyday life to focus visual search on areas of the scene most likely to contain the target object. It also exaggerates the prevalence of serial processing. Contemporary approaches emphasise the role of perception in visual search. Parallel processing is very common because much information is typically extracted from the peripheral visual field as well as from central or foveal vision. Problems in visual search occur when there is visual crowding in peripheral vision.



Cross-modal effects. In the real world, we often coordinate information across sense modalities. In the ventriloquist effect, vision dominates sound because an object’s location is typically indicated more precisely by vision. In the temporal ventriloquism effect, sound dominates vision because the auditory modality is typically more precise at discriminating temporal relations. Both effects depend on the assumption that visual and auditory stimuli come from the same object. Auditory or vibrotactile warning signals that are informative about the direction of danger and/or imminence of collision speed up drivers’ braking times. We lack a theoretical framework within which to understand why some warning signals are more effective than others.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 232

28/02/20 6:44 PM



Attention and performance



Divided attention: dual-task performance. Individuals engaging in heavy multi-tasking show evidence of increased distractibility and impaired attentional control. A demanding secondary task (e.g., mobile-phone use) impairs aspects of driving performance requiring cognitive control but not well-practised driving skills (e.g., lane keeping). Multiple resource theory and threaded cognition theory both assume dual-task performance depends on several limited-capacity processing resources. This permits two tasks to be performed together successfully provided they use different processing resources. This general approach de-emphasises highlevel executive processes (e.g., monitoring and coordinating two tasks). Some neuroimaging studies have found underadditivity in dual-task conditions (less activation than for the two tasks performed separately). This may indicate people have limited general processing resources. Other neuroimaging studies have found dual-task conditions can introduce new processing demands of task coordination associated with activation within the dorsolateral prefrontal cortex and cerebellum. It is often unclear whether patterns of brain activation are directly relevant to task processing. The psychological refractory period (PRP) effect can be explained by a processing bottleneck during response selection. This remains the most influential explanation. However, some evidence supports resource models claiming parallel processing of two tasks is often possible. Such models are more flexible than bottleneck models and they provide an explanation for interference effects from the second of two tasks on the first one.



“Automatic” processing. Shiffrin and Schneider distinguished between slow, flexible controlled processes and fast, automatic ones. This distinction is greatly oversimplified. Other theorists have claimed automatic processes are unconscious, efficient, fast and goal-unrelated. However, these four processing features are not all-or-none and they often correlate poorly with each other. Thus, there is no sharp distinction between automatic and non-automatic processes. Moors’ (2016) decompositional approach plausibly assumes that there is considerable flexibility in terms of the extent to which any given process is “automatic”.

233

FURTHER READING Chen, Y.-C. & Spence, C. (2017). Assessing the role of the “unity assumption” on multi-sensory integration: A review. Frontiers in Psychology, 8 (Article 445). Factors determining the extent to which stimuli from different sensory modalities are integrated are discussed. Engstrom, J., Markkula, G., Victor, T. & Merat, N. (2017). Effects of cognitive load on driving performance: The cognitive control hypothesis. Human Factors,

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 233

28/02/20 6:44 PM

234

Visual perception and attention 59, 734–764. Johan Engstrom and his colleagues review research on factors influencing driving performance and provide a new theoretical approach. Hulleman, J. & Olivers, C.N.L. (2017). The impending demise of the item in visual search. Behavioral and Brain Sciences, 40, 1–20. This review article indicates very clearly why theoretical accounts of visual search increasingly emphasise the role of fixations and visual perception. Several problems with previous a­ ttention-based theories of visual search are also discussed. Karnath, H.-O. (2015). Spatial attention systems in spatial neglect. Neuropsychologia, 75, 61–73. Hans-Otto Karnath discusses theoretical accounts of neglect emphasising the role of attentional systems. Koch, I., Poljac, E., Müller, H. & Kiesel, A. (2018). Cognitive structure, flexibility, and plasticity in human multitasking – An integrative review of dual-task and task-switching research. Psychological Bulletin, 144, 557–583. Iring Koch and colleagues review dual-task and task-switching research with an emphasis on major theoretical perspectives. McDermott, J.H. (2018). Audition. In J.T. Serences (ed.), Stevens’ Handbook of Experimental Psychology and Cognitive Neuroscience, Vol. 2: Sensation, Perception, and Attention (4th edn; pp. 63–120). New York: Wiley. Josh McDermott discusses theory and research focused on selective auditory attention in this comprehensive chapter. Melnikoff, D.E. & Bargh, J.A. (2018). The mythical number two. Trends in Cognitive Sciences, 22, 280–293. Research revealing limitations with traditional theoretical approaches to “automaticity” is discussed. Moors, A. (2016). Automaticity: Componential, causal, and mechanistic explanations. Annual Review of Psychology, 67, 263–287. Agnes Moors provides an excellent critique of traditional views on “automaticity” and develops her own comprehensive theoretical account. Nobre, A.C. (2018). Attention. In J.T. Serences (ed.), Stevens’ Handbook of Experimental Psychology and Cognitive Neuroscience, Vol. 2: Sensation, Perception, and Attention (4th edn; pp. 241–316). New York: Wiley. Anna (Kia) Nobre discusses the key role played by attention in numerous aspects of cognitive processing.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_1.indd 234

28/02/20 6:44 PM

Taylor& Francis Taylor & Francis Group http://taylorandfrancis.com

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 236

28/02/20 4:19 PM

How important is memory? Imagine if we were without it. We would not recognise anyone or anything as familiar. We would be unable to talk, read or write because we would remember nothing about language. We would have extremely limited personalities because we would have no recollection of the events of our own lives and therefore no sense of self. In sum, we would have the same lack of knowledge as a newborn baby. Nairne et al. (2007) argued there were close links between memory and survival in our evolutionary history. Our ancestors prioritised information relevant to their survival (e.g., remembering the location of food or water; ways of securing a mate). Nairne et  al. found memory for word lists was especially high when participants rated the words for their relevance to survival in a dangerous environment: the survival-processing effect. This effect has been replicated several times (Kazanas & Altarriba, 2015) and is stronger when participants imagine themselves alone in a dangerous environment rather than with a group of friends (Leding & Toglia, 2018). In sum, human memory may have evolved in part to promote survival. We use memory for numerous purposes throughout every day of our lives. It allows us to keep track of conversations, to remember how to use a mobile phone, to write essays in examinations, to recognise other people’s faces, to take part in conversations, to ride a bicycle, to carry out intentions and, perhaps, to play various sports. More generally, our interactions with others and with the environment depend crucially on having an effective memory system.

PART II

Memory

The wonders of human memory are discussed at length in Chapters 6–8. Chapter 6 deals mainly with key issues regarded as important from the early days of memory research. For example, we consider the distinction between short-term and long-term memory. The notion of short-term memory has been largely superseded by that of a working-memory system combining the functions of processing and short-term information storage. There is extensive coverage of working memory in Chapter 6. Another topic discussed at length in Chapter 6 is learning. Long-term memory is generally enhanced when meaning is processed at the time of learning. Long-term memory is also better if much of the learning period

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 237

28/02/20 4:19 PM

238 Memory

is spent practising retrieval. Evidence suggesting some learning is implicit (i.e., does not depend on conscious processes) is also discussed. Finally, we discuss forgetting. Why do we tend to forget information over time? Chapter 7 is devoted to long-term memory. Our long-term memories include personal information, knowledge about language, much knowledge about psychology (hopefully!), knowledge about thousands of objects in the world around us, and information about how to perform various skills (e.g., riding a bicycle; playing the piano). The central issue addressed in Chapter 7 is how to account for this incredible richness. Several theorists have claimed there are several long-term memory systems. Others argue that there are numerous processes that are combined and recombined depending on the specific demands of any given memory task. Memory is important in everyday life in ways de-emphasised historically. For example, autobiographical memory (discussed in Chapter 8) is of great significance to us. It gives us a coherent sense of ourselves and our personalities. The other topics considered in Chapter 8 are eyewitness testimony and prospective memory (memory for future intentions). Research into eyewitness testimony has revealed that eyewitness testimony is often much less accurate than generally assumed. This has implications for the legal system because hundreds of innocent individuals have been imprisoned solely on the basis of eyewitness testimony. When we think about memory, we naturally focus on memory of the past. However, we also need to remember numerous future commitments (e.g., meeting a friend as arranged), and such remembering involves prospective memory. We will consider how we try to ensure we carry out our future intentions. The study of human memory is fascinating, and substantial progress has been made. However, it is complex and depends on several factors. Four kinds of factors are especially important: events, participants, encoding and retrieval (Roediger, 2008). Events range from words and pictures to texts and life events. Participants vary in age, expertise, memory-specific disorders and so on. What happens at encoding varies as a function of task instructions, the immediate context and participants’ strategies. Finally, memory performance at retrieval often varies considerably depending on the nature of the memory task (e.g., free recall; cued recall; recognition). The take-home message is that memory findings are context-sensitive – they depend on interactions between the four factors. Thus, the effects of manipulating, say, what happens at encoding depend on the participants used, the events to be remembered and the conditions of retrieval. That explains why Roediger (2008) entitled his article, “Why the laws of memory vanished”. How, then, do we make progress? As Baddeley (1978, p. 150) argued, “The most fruitful way to extend our understanding of human memory is not to search for broader generalisations and ‘principles’, but is rather to develop ways of separating out and analysing more deeply the complex underlying processes.”

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 238

28/02/20 4:19 PM

Chapter

Learning, memory and forgetting

6

INTRODUCTION This chapter (and the next two) focus on human memory. All three chapters deal with intact human memory, but Chapter 7 also considers amnesic patients in detail. Traditional laboratory-based research is the focus of this chapter and Chapter 7, with more naturalistic research being discussed in Chapter 8. There are important links among these different types of research. For example, many theoretical issues relevant to brain-damaged and healthy individuals can be tested in the laboratory or in the field. Learning and memory involve several stages of processing. Encoding occurs during learning: it involves transforming presented information into a representation that can subsequently be stored. This is the first stage. As a result of encoding, information is stored within the memory system. Thus, storage is the second stage. The third stage is retrieval, which involves recovering information from the memory system. Forgetting (discussed later, see pp. 278–293) occurs when our attempts at retrieval are unsuccessful. Several topics are discussed in this chapter. The basic structure of the chapter consists of three sections: (1) The first section focuses mostly on short-term memory (a form of memory in which information is held for a brief period of time). This section has three topics (short-term vs long-term memory; working memory; and working memory: executive functions and individual differences). The emphasis here is on the early stages of processing (especially encoding). (2) The second section focuses on learning and the processes occurring during the acquisition of information (i.e., encoding processes) leading to long-term memory. Learning can be explicit (occurring with conscious awareness of what has been learned) or implicit (occurring without conscious awareness of what has been learned). The first two topics in this section (levels of processing; learning through retrieval) focus on explicit learning whereas the third topic focuses on implicit learning.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 239

KEY TERM Encoding The process by which information contained in external stimuli is transformed into a representation that can be stored within the memory system.

28/02/20 4:19 PM

240 Memory

KEY TERM Iconic memory A sensory store that holds visual information for between 250–1,000 milliseconds following the offset of a visual stimulus.

(3) The third section consists of a single topic: forgetting from long-term memory. The emphasis differs from the other two sections in that the emphasis is on retrieval processes rather than encoding processes. More specifically, the focus is on the reasons responsible for the failures of retrieval.

SHORT-TERM VS LONG-TERM MEMORY Many theorists distinguish between short-term and long-term memory. For example, there are enormous differences in capacity: only a few items can be held in short-term memory compared with essentially unlimited capacity in long-term memory. There are also massive differences in duration: a few seconds for short-term memory compared with up to several decades for long-term memory. The distinction between short-term and long-term memory stores was central to multi-store models. More recently, however, some theorists have proposed unitary-store models in which this distinction is much less clear-cut. Both types of models are discussed below.

Multi-store model Atkinson and Shiffrin (1968) proposed an extremely influential multi-store model (see Figure 6.1): ●●

●● ●●

sensory stores, each modality-specific (i.e., limited to one sensory modality) and holding information very briefly; short-term store of very limited capacity; long-term store of essentially unlimited capacity holding information over very long periods of time.

According to the multi-store model, environmental stimulation is ­initially processed by the sensory stores. These stores are modality-specific (e.g., vision; hearing). Information is held very briefly in the sensory stores, with some being attended to and processed further within the short-term store.

Sensory stores The visual store (iconic memory) holds visual information briefly. According to a recent estimate (Clarke & Mack, 2015), iconic memory for a natural scene lasts for at least 1,000 ms after stimulus offset. If you twirl

Figure 6.1 The multi-store model of memory as proposed by Atkinson and Shiffrin (1968).

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 240

28/02/20 4:19 PM



241

Learning, memory and forgetting

a lighted object in a circle in the dark, you will see a circle of light because of the persistence of visual information in iconic memory. More generally, iconic memory increases the time for which visual information is accessible (e.g., when reading). Atkinson and Shiffrin (1968) and many other theorists have assumed iconic memory is pre-attentive (not dependent on attention). However, Mack et  al. (2016) obtained findings strongly suggesting that iconic memory does depend on attention. Participants had to report the letters in the centre of a visual array (iconic memory) or whether four circles presented close to the fixation point were the same colour. Performance on the iconic memory task was much worse when the probability of having to perform the iconic memory task was only 10% rather than 90%. This happened because there was much less attention to the letters in the former condition. Echoic memory, the auditory equivalent of iconic memory, holds auditory information for a few seconds. Suppose someone asked you a question while your mind was elsewhere. Perhaps you replied “What did you say?”, just before realising you did know what had been said. This “playback” facility depends on echoic memory. Ioannides et  al. (2003) found the duration of echoic memory was longer in the left hemisphere than the right, probably because of the dominance of the left hemisphere in language processing. There are sensory stores associated with all other senses (e.g., touch; taste). However, they are less important than iconic and echoic memory and have attracted much less research.

KEY TERMS Echoic memory A sensory store that holds auditory information for approximately 2–3 seconds. Chunks Stored units formed from integrating smaller pieces of information.

Short-term memory Short-term memory has very limited capacity. Consider digit span: participants listen to a random digit series and then repeat back the digits immediately in the correct order. There are also letter and word spans. The maximum number of items recalled without error is typically about seven. There are two reasons for rejecting seven items as the capacity of short-term memory. First, we must distinguish between items and chunks – “groups of items . . . collected together and treated as a single unit” (Mathy & Feldman, 2012, p. 346). For example, most individuals presented with the letter string IBMCIAFBI would treat it as three chunks rather than nine letters. Here is another example: you might find it hard to recall the following five words: is thing many-splendoured a love but easier to recall the same words presented as follows: love is a many-splendoured thing. Simon (1974) showed the importance of chunking. Immediate serial recall was 22 words with 8-word sentences but only 7 with unrelated words. In contrast, the number of chunks recalled varied less: it was 3 with the sentences compared to 7 with the unrelated words. Second, estimates of short-term memory capacity are often inflated because participants’ performance is influenced by rehearsal and long-term memory. What influences chunking? As we have seen, it is strongly determined by information stored in long-term memory (e.g., IBM stands for International Business Machines). However, chunking also depends on people’s abilities to identify patterns or regularities in the material presented for learning.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 241

Interactive exercise: Capacity of short-term memory

Interactive exercise: Duration of short-term memory

28/02/20 4:19 PM

242 Memory

KEY TERM Articulatory suppression Rapid repetition of a simple sound (e.g., “the the the”), which uses the articulatory control process of the phonological loop.

For example, compare the digit sequences 2 3 4 5 6 and 2 4 6 3 5. It is much easier to chunk the former sequence as “all digits between 2 and 6”. Chekaf et  al. (2016) found participants’ short-term memory was greatly enhanced by spontaneous detection of such patterns. When there were no patterns in the learning material, short-term memory was only three items. A similar capacity limit was reported by Chen and Cowan (2009). When rehearsal was prevented by articulatory suppression (saying “the” repeatedly), only three chunks were recalled. Within the multi-store model, it is assumed all items within short-term memory have equal importance. However, this is an oversimplification. Vergauwe and Langerock (2017) assessed speed of performance when participants were presented with four letters followed by a probe letter and decided whether the probe was the same as any of the original letters. Response to the probe was fastest when it corresponded to the letter currently being attended to (cues were used to manipulate which letter was the focus of attention at any given moment). How is information lost from short-term memory? Several answers have been provided (Endress & Szabó, 2017). Atkinson and Shiffrin (1968) emphasised the importance of displacement – the capacity of short-term memory is very limited, and so new items often displace items currently in short-term memory. Another possibility is that information in short-term memory decays over time in the absence of rehearsal. A further possibility is interference which could come from items on previous trials and/or from information presented during the retention interval. The experimental findings are variable. Berman et  al. (2009) claimed interference is more important than decay. Short-term memory performance on any given trial was disrupted by words presented on the previous trial. Suppose this disruption effect occurred because words from the previous trial had not decayed sufficiently. If so, disruption would have been greatly reduced by increasing the inter-trial interval. In fact, increasing that interval had no effect. However, the disruption effect was largely eliminated when interference from previous trials was reduced. Campoy (2012) pointed out Berman et al.’s (2009) research was limited because their experimental design did not allow them to observe any decay occurring within 3.3 seconds of item presentation. Campoy obtained strong decay effects at time intervals shorter than 3.3 seconds. Overall, the findings suggest decay occurs mostly at short retention intervals and interference at longer ones. Strong evidence interference is important was reported by Endress and Potter (2014). They rapidly presented 5, 11 or 21 pictures of familiar objects. In their unique condition, no pictures were repeated over trials, whereas in their repeated condition, the same pictures were seen frequently over trials. Short-term memory was greater in the unique condition in which there was much less interference than in the repeated condition (see Figure 6.2). In sum, most of the evidence indicates that interference is the most important factor causing forgetting from short-term memory, although decay may also play a part. There is little direct evidence that displacement (emphasised by Atkinson & Shiffrin, 1968) is the main factor causing forgetting. However, it is possible that interference causes items to be displaced from short-term memory (Endress & Szabó, 2017).

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 242

28/02/20 4:19 PM



Learning, memory and forgetting

10

Unique condition Repeated condition

9.1

Capacity estimate

8 6

4.9

Figure 6.2 Short-term memory performance in conditions designed to create interference (repeated condition) or minimise interference (unique condition) for set sizes 5, 11 and 21 pictures. From Endress and Potter, 2014.

4 3.2 2

243

3.7

4.8

2.3

0 5

11

21

Set size

Short-term vs long-term memory Is short-term memory distinct from long-term memory, as assumed by Atkinson and Shiffrin (1968)? If they are separate, we would expect some patients to have impaired long-term memory but intact short-term memory with others showing the opposite pattern. This would produce a double dissociation (see Glossary). The findings are generally supportive. Patients with amnesia (discussed in Chapter 7) have severe long-term memory ­impairments but nearly all have intact short-term memory (Spiers et al., 2001). A few brain-damaged patients have severely impaired short-term memory but intact long-term memory. For example, KF had no problems with long-term learning and recall but had a very small digit span (Shallice & Warrington, 1970). Subsequent research indicated his shortterm memory problems focused mainly on recall of verbal material (letters; words; digits) rather than meaningful sounds or visual stimuli (Shallice & Warrington, 1974).

Evaluation The multi-store model has been enormously influential. It is widely accepted (but see below) that there are three separate kinds of memory stores. Several sources of experimental evidence support the crucial distinction between short-term and long-term memory. However, the strongest evidence probably comes from brain-damaged patients having impairments only to shortterm or long-term memory. What are the model’s limitations? First, it is very oversimplified (e.g., the assumptions that the short-term and long-term stores are both unitary: operating in a single, uniform way). Below we discuss an approach where the single short-term store is replaced by a working memory system having four components. In similar fashion, there are several long-term memory systems (see Chapter 7).

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 243

28/02/20 4:19 PM

244 Memory

Second, the assumption that the short-term store is a gateway between the sensory stores and long-term memory (see Figure 6.1) is incorrect. The  information processed in short-term memory has typically already made contact with information in long-term memory (Logie, 1999). For example, you can only process IBM as a single chunk in short-term memory after you have accessed long-term memory to obtain the meaning of IBM. Third, Atkinson and Shiffrin (1968) assumed information in shortterm memory represents the “contents of consciousness”. This implies only information processed consciously is stored in long-term memory. However, there is much evidence for implicit learning (learning without conscious awareness of what has been learned) (discussed later, see pp. 269–278). Fourth, the assumption all items within short-term memory have equal status is incorrect. The item currently being attended to is accessed more rapidly than other items within short-term memory (Vergauwe & Langerock, 2017). Fifth, the notion that most information is transferred to long-term memory via rehearsal greatly exaggerates its role in learning. In fact, only a small fraction of the information stored in long-term memory was rehearsed during learning. Sixth, the notion that forgetting from short-term memory is caused by displacement minimises the role of interference.

Unitary-store model Several theorists have argued the multi-store approach should be replaced by a unitary-store model. According to such a model, “STM [short-term memory] consists of temporary activations of LTM [long-term memory] representations or of representations of items that were recently perceived” (Jonides et  al., 2008, p. 198). In essence, Atkinson and Shiffrin (1968) emphasised the differences between short-term and long-term memory whereas advocates of the unitary-store approach focus on the similarities. How can unitary-store models explain amnesic patients having essentially intact short-term memory but severely impaired long-term memory? Jonides et  al. (2008) argued they have special problems in forming novel relations (e.g., between items and their context) in both short-term and long-term memory. Amnesic patients perform well on short-term memory tasks because such tasks typically do not require storing relational information. Thus, amnesic patients should have impaired short-term memory performance on tasks requiring relational memory. According to Jonides et  al. (2008), the hippocampus and surrounding medial temporal lobes (damaged in amnesic patients) are crucial for forming novel relations. Multi-store theorists assume these structures are much more involved in long-term than short-term memory. However, unitary-store models predict the hippocampus and medial temporal lobes would be involved if a short-term memory task required forming novel relations.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 244

28/02/20 4:19 PM



Learning, memory and forgetting

245

Findings Several studies have assessed the performance of amnesic patients on short-term memory tasks. In some studies (e.g., Hannula et  al., 2006) the performance of amnesic patients was impaired. However, Jeneson and Squire (2012) in a review found these allegedly short-term memory studies also involved long-term memory. More specifically, the information to be learned exceeded the capacity of short-term memory and so necessarily involved long-term memory as well as short-term memory (Norris, 2017). As a result, such studies do not demonstrate deficient short-term memory in amnesic patients. Several neuroimaging studies have reported hippocampal involvement (thought to be crucial for long-term memory) during short-term memory tasks. However, it has generally been unclear whether hippocampal activation was due in part to encoding for long-term memory. An exception was a study by Bergmann et  al. (2012). They assessed short-term memory for face–house pairs followed by an unexpected test of long-term memory for the pairs. What did Bergmann et al. (2012) find? Encoding of pairs remembered in both short- and long-term memory involved the hippocampus. However, there was no hippocampal activation at encoding when short-term memory for word pairs was successful but subsequent long-term memory was not. Thus, the hippocampus was only involved on a short-term memory task when long-term memories were being formed.

Evaluation As predicted by the unitary-store approach, activation of part of long-term memory often plays an important role in short-term memory. More specifically, relevant information from long-term memory frequently influences the contents of short-term memory. What are the limitations of the unitary-store approach? First, the claim that short-term memory consists only of activated long-term memory is oversimplified. As Norris (2017, p. 992) pointed out, “The central problem . . . is that STM has to be able to store arbitrary configurations of novel information. For example, we can remember novel sequences of words or dots in random positions on a screen. These cannot possibly have pre-existing representations in LTM that could be activated.” Short-term memory is also more flexible than expected on the unitary-store approach (e.g., backward digit recall: recalling digits in the opposite order to the one presented). Second, we must distinguish between the assumption that short-term memory is only activated long-term memory and the assumption that short-term and long-term memory are separate but often interact. Most evidence supports the latter assumption rather than the former. Third, the theory fails to provide a precise definition of the crucial explanatory concept of “activation”. It is thus unclear how activation might maintain representations in short-term memory (Norris, 2017). Fourth, the medial temporal lobes (including the hippocampus) are of crucial importance for many forms of long-term memory (especially

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 245

28/02/20 4:19 PM

246 Memory

declarative memory – see Glossary). Amnesic patients with damage to these brain areas have severely impaired declarative memory. In contrast, amnesic patients typically have intact short-term memory (Spiers et al., 2001).

WORKING MEMORY: BADDELEY AND HITCH

Photos courtesy of Alan Baddeley and Graham Hitch.

Research activity: Phonemic similarity

Is short-term memory useful in everyday life? Textbook writers used to argue it allows us to remember a telephone number for the few seconds required to dial it. Of course, that is now irrelevant  –  our mobile phones store all the phone numbers we need ­ regularly. Baddeley and Hitch (1974) provided a convincing answer to the above question. They argued we typically use short-term memory when performing complex tasks. Such tasks involve storing information about the outcome of early processes in short-term memory while moving on to later processes. Baddeley and Hitch’s key insight was that short-term memory is essential to the performance of numerous tasks that are not explicitly memory tasks. The above line of thinking led Baddeley and Hitch (1974) to replace the concept of short-term memory with that of working memory. Working memory “refers to a system, or a set of processes, holding mental representations temporarily available for use in thought and action” (Oberauer et  al., 2018, p. 886). Since 1974, there have been several developments of the working memory system (Baddeley, 2012, 2017; see Figure 6.3):

Central executive Shape

Object Visual

Kinaesthetic Tactile Spatial

Smell

Taste Speech

Haptic

Lip-reading

Music and sound

Episodic buffer Visuo-spatial sketch pad

Visual semantics

Artic

Alan Baddeley and Graham Hitch.

Episodic long-term memory

Phonological loop

Language

Figure 6.3 The working memory model showing the connections between its four components and their relationship to long-term memory. Artic = articulatory rehearsal. From Darling et al., 2017.

●●

●●

●●

●●

a modality-free central executive, which “is an attentional system” (Baddeley, 2012, p. 22); a phonological loop processing and storing information briefly in a phonological (speech-based) form; a visuo-spatial sketchpad specialised for spatial and visual processing and temporary storage; an episodic buffer providing temporary storage for integrated information coming from the visuo-spatial sketchpad and phonological loop; this component (added by Baddeley, 2000) is discussed later (see pp. 252–253).

The most important component is the central executive. The phonological loop and the visuo-spatial sketchpad are slave systems used by the central executive for specific purposes. The phonological loop preserves word order, whereas the visuo-spatial sketchpad stores and manipulates spatial and visual information. All three components discussed above have limited capacity and can function fairly independently of the others. Two key assumptions follow: (1) If two tasks use the same component, they cannot be performed successfully together. (2) If two tasks use different components, they can be performed as well together as separately. Robbins et  al. (1996) investigated these assumptions in a study on the selection of chess moves. Chess players selected continuation moves from various chess positions while also performing one of the following tasks: ●● ●● ●●

●●

247

Learning, memory and forgetting

KEY TERMS Working memory A limited-capacity system used in the processing and brief holding of information. Central executive A modality-free, limitedcapacity, component of working memory. Phonological loop A component of working memory in which speechbased information is processed and stored briefly and subvocal articulation occurs. Visuo-spatial sketchpad A component of working memory used to process visual and spatial information and to store this information briefly. Episodic buffer A component of working memory; it is essentially passive and stores integrated information briefly.

repetitive tapping: the control condition; random letter generation: this involves the central executive; pressing keys on a keypad in a clockwise fashion: this uses the visuo-spatial sketchpad; rapid repetition of the word “see-saw”: this is articulatory suppression and uses the phonological loop.

The quality of chess moves was impaired when the additional task involved the central executive or visuo-spatial sketchpad but not when it involved the articulatory loop. Thus, calculating successful chess moves requires use of the central executive and the visuo-spatial sketchpad but not the articulatory loop.

Phonological loop According to the working memory model, the phonological loop has two components (see Figure 6.4): ●● ●●

a passive phonological store directly concerned with speech perception; an articulatory process linked to speech production (i.e., rehearsal) giving access to the phonological store.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 247

28/02/20 4:19 PM

248 Memory Figure 6.4 Phonological loop system as envisaged by Baddeley (1990).

Interactive exercise: Encoding in STM

KEY TERMS Phonological similarity effect The finding that immediate serial recall of verbal material is reduced when the items sound similar. Word-length effect The finding that verbal memory span decreases when longer words are presented. Orthographic neighbours With reference to a target word, the number of words that can be formed by changing one of its letters.

Suppose we test individuals’ memory span by presenting a word list visually and requiring immediate recall in the correct order. Would they use the phonological loop to engage in verbal rehearsal (i.e., saying the words repeatedly to themselves)? Two kinds of evidence (discussed below) indicate the answer is “Yes”. First, there is the phonological similarity effect – reduced immediate serial recall when words are phonologically similar (i.e., have similar sounds). For example, Baddeley et  al. (2018) found that short-term memory was much worse with phonologically similar words (e.g., pan, cat, bat, ban, pad, man) than phonologically dissimilar words (e.g., man, pen, rim, cod, bud, peel). The working memory model does not make it clear whether the phonological similarity effect depends more on acoustic similarity (similar sounds) or articulatory similarity (similar articulatory movements). Schweppe et al. (2011) found the effect depends more on acoustic than articulatory similarity. However, there was an influence of articulatory similarity when recall was spoken. Second, there is the word-length effect: word span (words recalled immediately in the correct order) is greater for short than long words. Baddeley et al. (1975) obtained this effect with visually presented words. As predicted, the effect disappeared when participants engaged in articulatory suppression (repeating the digits 1 to 8) to prevent rehearsal within the phonological loop during list presentation. In similar fashion, Jacquemot et al. (2011) found a brain-damaged patient with greatly impaired ability to engage in verbal rehearsal had no word-length effect. Jalbert et  al. (2011) pointed out a short word generally has more orthographic neighbours (words of the same length differing from it in only one letter) than a long word. When short (one-syllable) and long (three-syllable) words were equated for neighbourhood size, the wordlength effect disappeared. Thus, the word-length effect may be misnamed. Which brain areas are associated with the phonological loop? Areas in the parietal lobe, especially the supramarginal gyrus (BA40) and angular gyrus (BA39), are associated with the phonological store, whereas Broca’s area (approximately BA44 and BA45) within the frontal lobe is associated with the articulatory control process. Evidence indicating these areas differ in their functioning was reported by Papagno et al. (2017). Patients undergoing brain surgery received direct

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 248

28/02/20 4:19 PM



249

Learning, memory and forgetting

electrical stimulation while performing a digit-span task. Stimulation within the parietal lobe increased item errors in the task because it disrupted the storage of information. In contrast, stimulation within Broca’s area increased order errors because it disrupted rehearsal of items in the correct order (see Figure 6.5). How is the phonological loop useful in everyday life? The answer is not immediately obvious. Baddeley et  al. (1988) found a female patient, PV, with a very small digit span (only two items) coped very well (e.g., running a shop and raising a family). In subsequent research, however, Baddeley et al. (1998) argued the phonological loop is useful when learning a language. PV (a native Italian speaker) had generally good learning ability but was totally unable to associate Russian words with Figure 6.5 their Italian translations. Indeed, she showed no Sites where direct electrical stimulation disrupted digitspan performance. Item-error sites are in blue, orderlearning at all over ten trials! error sites are in yellow and sites where both types of The phonological loop (“inner voice”) is also errors occurred are in green. used to resist temptation. Tullett and Inzlicht (2010) found articulatory suppression (saying computer repeatedly) reduced participants’ ability to control their actions (they were more likely to respond on trials where they should have inhibited a response).

Visuo-spatial sketchpad The visuo-spatial sketchpad is used for the temporary storage and manipulation of visual patterns and spatial movement. In essence, visual processing involves remembering what and spatial processing involves remembering where. In everyday life, we use the sketchpad to find the route when moving from one place to another or when watching television. The distinction between visual and spatial processing is very clear with respect to blind individuals. Schmidt et  al. (2013) found blind individuals could construct spatial representations of the environment almost as accurately as those of sighted individuals despite their lack of visual processing. Is there a single system containing combining visual and spatial processing or are there partially separate systems? Logie (1995) identified two separate components:

KEY TERMS

(1) visual cache: this stores information about visual form and colour; (2) inner scribe: this processes spatial and movement information; it is involved in the rehearsal of information in the visual cache and transfers information from the visual cache to the central executive.

Visual cache According to Logie, the part of the visuo-spatial sketchpad that stores information about visual form and colour.

Smith and Jonides (1997) obtained findings supporting the notion of separate visual and spatial systems. Two visual stimuli presented together were followed by a probe stimulus. Participants decided whether the probe was in the same location as one of the initial stimuli (spatial task) or had the same form (visual task). Even though the stimuli presented were identical in

Inner scribe According to Logie, the part of the visuo-spatial sketchpad dealing with spatial and movement information.

250 Memory

Figure 6.6 Amount of interference on a spatial task (dots) and a visual task (ideographs) as a function of a secondary task (spatial: movement vs visual: colour discrimination). From Klauer and Zhao (2004). © 2000 American Psychological Association. Reproduced with permission.

the two tasks, there was more activity in the right hemisphere during the spatial task than the visual task, but the opposite was the case for activity in the left hemisphere. Zimmer (2008) found in a research review that areas within the occipital and temporal lobes were activated during visual processing. In contrast, areas within the parietal cortex (especially the intraparietal sulcus) were activated during spatial processing. Klauer and Zhao (2004) used two main tasks: (1) a spatial task (memory for dot locations); (2) a visual task (memory for Chinese characters). The main task was performed at the same time as a visual (colour discrimination) or spatial (movement discrimination) interference task. If the visuo-spatial sketchpad has separate spatial and visual components, the spatial interference task should disrupt performance more on the spatial main task. Second, the visual interference task should disrupt performance more on the visual main task. Both predictions were supported (see Figure 6.6). Vergauwe et  al. (2009) argued that visual and spatial tasks often require the central executive’s attentional resources. They used more demanding versions of Klauer and Zhao’s (2004) main tasks and obtained different findings: each type of interference (visual and spatial) had comparable effects on the spatial and visual main tasks. Thus, there are general, attentionally demanding interference effects when tasks are demanding but also interference effects specific to the type of interference when tasks are relatively undemanding. Morey (2018) discussed the theoretical assumption that the visuo-­ spatial sketchpad is a specialised system separate from other cognitive systems and components of working memory. She identified two predictions following from that assumption: (1) Some brain-damaged patients should have selective impairments of visual and/or spatial short-term memory with other cognitive processes and systems essentially intact. (2) Short-term visual or spatial memory in healthy individuals should be largely or wholly unaffected by the requirement to perform a secondary task at the same time (especially when that task does not require visual or spatial processing). Morey (2018) reviewed evidence inconsistent with both the above predictions. First, the great majority of brain-damaged patients with impaired visual and/or spatial short-term memory also have various more general cognitive impairments. Second, Morey carried out a meta-analytic review and found that short-term visual and spatial memory was strongly impaired by cognitively demanding secondary tasks. This was the case even when the secondary task did not require visual or spatial processing.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 250

28/02/20 4:19 PM



251

Learning, memory and forgetting

In sum, there is some support for the notion that the visuo-spatial sketchpad has somewhat separate visual and spatial components. However, the visuo-spatial sketchpad seems to interact extensively with other cognitive and memory systems, which casts doubt on the theoretical assumption that it often operates independently from other systems.

Central executive The central executive (which resembles an attentional system) is the most important and versatile component of the working memory system. It is heavily involved in almost all complex cognitive activities (e.g., solving a problem; carrying out two tasks at the same time) but does not store information. There is much controversy concerning the brain regions most associated with the central executive and its various functions (see below, pp.  257–262). However, it is generally assumed the prefrontal cortex is heavily involved. Mottaghy (2006) reviewed studies using repetitive transcranial magnetic stimulation (rTMS; see Glossary) to disrupt the dorsolateral prefrontal cortex (BA9/46). Performance on many complex cognitive tasks was impaired by this manipulation. However, executive processes do not depend solely on the prefrontal cortex. Many brain-damaged patients (e.g., those with diffuse trauma) have poor executive functioning despite having little or no frontal damage (Stuss, 2011). Baddeley has always recognised that the central executive is associated with several executive functions (see Glossary). For example, Baddeley (1996) speculatively identified four such processes: (1) focusing attention or concentration; (2) dividing attention between two stimulus streams; (3) switching attention between tasks; and (4) interfacing with longterm memory. It has proved difficult to obtain consensus on the number and nature of executive processes. However, two influential theoretical approaches are discussed below. Brain-damaged individuals whose central executive functioning is impaired suffer from dysexecutive syndrome. Symptoms include impaired response inhibition, rule deduction and generation, maintenance and shifting of sets, and information generation (Godefroy et al., 2010). Unsurprisingly, patients with this syndrome have great problems in holding a job and ­functioning adequately in everyday life (Chamberlain, 2003).

KEY TERMS Executive processes Processes that organise and coordinate the functioning of the cognitive system to achieve current goals. Dysexecutive syndrome A condition in which damage to the frontal lobes causes impairments to the central executive component of working memory.

Evaluation The notion of a unitary central executive is greatly oversimplified (see below). As Logie (2016, p. 2093) argued, “Executive control [may] arise from the interaction among multiple differing functions in cognition that use different, but overlapping, brain networks . . . the central executive might now be offered a dignified retirement.” Similar criticisms can be directed against the notion of a dysexecutive syndrome. Patients with widespread damage to the frontal lobes may have a global dysexecutive syndrome. However, as discussed below, patients with limited frontal damage display various patterns of impairment to executive processes (Stuss & Alexander, 2007).

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 251

28/02/20 4:19 PM

252 Memory

Episodic buffer Case study: The episodic buffer

Why was the episodic buffer added to the model? There are various reasons. First, the original version of the model was limited because its components were too separate in their functioning. For example, it was unclear how verbal information from the phonological loop and visual and spatial information from the visuo-spatial sketchpad was integrated to form multidimensional representations. Second, it was hard to explain within the original model the finding that people can provide immediate recall of up to 16 words presented in sentences (Baddeley et  al., 1987). This high level of immediate sentence recall is substantially beyond the capacity of the phonological loop. The function of the episodic buffer is suggested by its name. It is ­episodic because it holds integrated information (or chunks) about episodes or event in a multidimensional code combining visual, auditory and other information sources. It acts as a buffer between the other working memory components and also links to perception and long-term memory. Baddeley (2012) suggested the capacity of the episodic buffer is approximately four chunks (integrated units of information). This potentially explains why people can recall up to 16 words in immediate recall from sentences. Baddeley (2000) argued the episodic buffer could be accessed only via the central executive. However, it is now assumed the episodic buffer can be accessed by the visuo-spatial sketchpad and the phonological loop as well as by the central executive (see Figure 6.3). In sum, the episodic buffer differed from the existing subsystems representations [i.e., phonological loop and visuo-spatial sketchpad] in being able to hold a limited number of multi-dimensional representations or episodes, and it differed from the central executive in having storage capacity . . . The episodic buffer is a passive storage system, the screen on which bound information from other sources could be made available to conscious awareness and used for planning future action. (Baddeley, 2017, pp. 305–306)

Findings Why did Baddeley abandon his original assumption that the central executive controls access to and from the episodic buffer? Consider a study by Allen et al. (2012). Participants were presented with visual stimuli and had to remember briefly a single feature (colour; shape) or colour–shape combinations. It was assumed combining visual features would require the central executive prior to storage in the episodic buffer. On that assumption, the requirement to perform a task requiring the central executive (counting backwards) at the same time should have reduced memory to a greater extent for colour–shape combinations than single features. Allen et  al. (2012) found that counting backwards had comparable effects on memory performance regardless of whether or not feature combinations needed to be remembered. These findings suggest combining

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 252

28/02/20 4:19 PM



253

Learning, memory and forgetting

6

1

2

3

4

5

6

7

8

9

Figure 6.7 Screen displays for the digit 6. Clockwise from top left: (1) single item display; (2) keypad display; and (3) linear display. From Darling and Havelka (2010).

0

0

1

2

3

4

5

6

7

8

9

visual features does not require the central executive but instead occurs “automatically” prior to information entering the episodic buffer. Grot et  al. (2018) clarified the relationship between the central executive and the episodic buffer. Participants learned to link or bind together words and spatial locations within the episodic buffer for a memory test. It was either relatively easy to bind words and spatial locations together (passive binding) or relatively difficult (active binding). The central executive was involved only in the more difficult active binding condition. Darling et  al. (2017) discussed several studies showing how memory can be enhanced by the episodic buffer. Much of this research focused on visuo-spatial bootstrapping (verbal memory being bootstrapped (supported) by visuo-spatial memory). Consider a study by Darling and Havelka (2010). Immediate serial recall of random digits was best when they were presented on a keypad display rather on a single item or linear display (see Figure 6.7). Why was memory performance best with the keypad display? This was the only condition which allowed visual information, spatial information and knowledge about keyboard displays accessed from long-term memory to be integrated within the episodic buffer using bootstrapping.

Evaluation The episodic buffer provides a brief storage facility for information from the phonological loop, the visuo-spatial sketchpad and long-term memory. Bootstrapping data (e.g., Darling & Havelka, 2010) suggest that processing in the episodic buffer “interacts with long-term knowledge to enable integration across multiple independent stimulus modalities” (Darling et  al., 2017, p. 7). The central executive is most involved when it is hard to bind together different kinds of information within the episodic buffer. What are the limitations of research on the episodic buffer? First, it remains unclear precisely how information from the phonological loop and the visuo-spatial sketchpad is combined to form unified representations within the episodic buffer. Second, as shown in Figure 6.3, it is assumed information from sensory modalities other than vision and hearing can be stored in the episodic buffer. However, relevant research on smell and taste is lacking.

254 Memory

KEY TERM Working memory capacity An assessment of how much information can be processed and stored at the same time; individuals with high capacity have higher intelligence and more attentional control.

Interactive exercise: Working memory

Overall evaluation The working memory model remains highly influential over 45 years since it was first proposed. There is convincing empirical evidence for all components of the model. As Logie (2015, p. 100) noted, it explains findings “from a very wide range of research topics, for example, aspects of children’s language development, aspects of counting and mental arithmetic, reasoning and problem solving, dividing and switching attention, navigating unfamiliar environments”. What are the model’s limitations? First, it is oversimplified. Several kinds of information are not considered within the model (e.g., those relating to smell, touch and taste). In addition, we can subdivide spatial working memory into somewhat separate eye-centred, hand-centred and foot-centred spatial working memory (Postle, 2006). This could lead to an unwieldy model with numerous components each responsible for a different kind of information. Second, the notion of a central executive should be replaced with a theoretical approach identifying the major executive processes (see below, pp. 257–262). Third, the notion that the visuo-spatial sketchpad is a specialised and relatively independent processing system is doubtful. There is much evidence (Morey, 2018) that it typically interacts with other working memory components (especially the central executive). Fourth, we need more research on the interactions among the four components of working memory (e.g., how the episodic buffer integrates information from the other components and from long-term memory). Fifth, the common assumption that conscious awareness is necessarily associated with processing in all working memory components requires further consideration. For example, executive processes associated with the functioning of the central executive can perhaps occur outside conscious awareness (Soto & Silvanto, 2014). As discussed in Chapter 16, many complex processes can apparently occur in the absence of conscious awareness.

WORKING MEMORY: INDIVIDUAL DIFFERENCES AND EXECUTIVE FUNCTIONS There have been numerous recent attempts to enhance our understanding of working memory. Here we will focus on two major theoretical approaches. First, some theorists (e.g., Engle & Kane, 2004) have focused on working memory capacity. In essence, they claim performance across numerous tasks (including memory ones) is strongly influenced by individual differences in working memory capacity. Second, many theorists have replaced a unitary central executive with several more specific executive functions.

Working memory capacity Several theorists (e.g., Engle & Kane, 2004) have considered working memory from the perspective of individual differences in working memory capacity, “the ability to hold and manipulate information in a temporary

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 254

28/02/20 4:20 PM



255

Learning, memory and forgetting

active state” (DeCaro et  al., 2016, p. 39). Daneman and Carpenter (1980) used reading span to assess this capacity. Individuals read sentences for comprehension (processing task) and then recalled the final word of each sentence (storage task). The reading span was defined as the largest number of sentences from which individuals could recall the final words over 50% of the time. Operation span is another measure of working memory capacity. Items (e.g., IS (4 × 2) – 3 = 5? TABLE) are presented. Individuals answer each arithmetical question and try to remember all the last words. Operation span is the maximum number of items for which individuals can remember all the last words over half the time. It correlates highly with reading span. Working memory capacity correlates positively with intelligence. We can clarify this relationship by distinguishing between crystallised ­intelligence (which depends on knowledge, skills and experience) and fluid intelligence (which involves a rapid understanding of novel relationships; see Glossary). Working memory capacity correlates more strongly with fluid intelligence (sometimes as high as +.7 or +.8; Kovacs & Conway, 2016). The correlation with crystallised intelligence is relatively low because it involves acquired knowledge whereas working memory capacity depends on cognitive processes and temporary information storage. Engle and Kane (2004) argued individuals who are high and low in working memory capacity differ in attentional control. In their influential two-factor theory, they emphasised two key aspects of attentional control: (1) the maintenance of task goals; (2) the resolution of response competition or conflict. Thus, high-capacity individuals are better at maintaining task goals and resolving conflict. How does working memory capacity relate to Baddeley’s working memory model? The two approaches differ in emphasis. Researchers investigating working memory capacity focus on individual differences in processing and storage capacity whereas Baddeley focuses on the underlying structure of working memory. However, there has been some convergence between the two theoretical approaches. For example, Kovacs and Conway (2016, p. 157) concluded that working memory capacity “reflects individual differences in the executive component of working memory, particularly executive attention and cognitive control”. In view of the association between working memory capacity and intelligence, we would expect high-capacity individuals to outperform low-capacity ones on complex tasks. That is, indeed, the case (see Chapter 10). However, Engle and Kane’s (2004) theory also predicts high-capacity individuals might perform better than low-capacity ones even on relatively simple tasks if it were hard to maintain task goals.

KEY TERMS Reading span The largest number of sentences read for comprehension from which an individual can recall all the final words over 50% of the time. Operation span The maximum number of items (arithmetical questions + words) for which an individual can recall all the words more than 50% of the time. Crystallised intelligence A form of intelligence that involves the ability to use one’s knowledge and experience effectively.

Findings There are close links between working memory capacity and the executive functions of the central executive. For example, McCabe et al. (2010) found measures of working memory capacity correlated highly with measures of executive functioning. Both types of measures reflect executive attention (which maintains task goals).

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 255

28/02/20 4:20 PM

256 Memory

The hypothesis that high-capacity individuals have greater attentional control than low-capacity ones has received experimental support. Sörqvist (2010) studied distraction effects caused by the sounds of planes flying past. Recall of a prose passage was adversely affected by distraction only in low-capacity individuals. Yurgil and Golob (2013), using event-­related potentials (ERPs; see Glossary), found that high-capacity individuals attended less than low-capacity ones to distracting auditory stimuli. We have seen goal maintenance or attentional control in low-­capacity individuals is disrupted by external distraction. It is also disrupted by internal task-unrelated thoughts (mind-wandering). McVay and Kane (2012) used a sustained-attention task in which participants responded to frequent target words but withheld responses to rare non-targets. Low-capacity individuals performed worse than high-capacity ones on this task because they engaged in more mind-wandering. Robison and Unsworth (2018) identified two main reasons why this might be the case. First, low-capacity individuals’ inferior attentional control may lead to increased amounts of spontaneous or unplanned mind-­wandering. Second, low-capacity individuals may be less motivated to perform cognitive tasks well and so engage in increased deliberate mind-wandering. Robison and Unsworth’s findings provided support only for the first reason. Individuals having low working memory capacity may have worse task performance than high-capacity ones because they consistently have poorer attentional control and ability to maintain the current task goal. Alternatively, their failures of attentional control may only occur relatively infrequently. Unsworth et  al. (2012) compared these two explanations. They used the anti-saccade task: a flashing cue is presented to the left (or right) of fixation followed by a target presented in the opposite location. Reaction times to identify the target were recorded. Unsworth et  al. (2012) divided each participant’s reaction times into quintiles (five bins representing the fastest 20%, the next fastest 20% and so on). Low-capacity individuals were significantly slower than the high-­ capacity ones only in the slowest quintile (see Figure 6.8). Thus, they experienced failures of goal maintenance or attentional goal on only a small fraction of trials.

Figure 6.8 Mean reaction times (RTs) quintile-by-quintile on the anti-saccade task by groups high and low in working memory capacity. From Unsworth et al. (2012).

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 256

28/02/20 4:20 PM



257

Learning, memory and forgetting

Evaluation Theory and research on working memory capacity indicate the value of focusing on individual differences. There is convincing evidence high- and low-capacity individuals differ in attentional control. More specifically, high-capacity individuals are better at controlling external and internal distracting information. In addition, they are less likely than low-capacity individuals to experience failures of goal maintenance. Of importance, individual differences in working memory capacity are relevant to performance on numerous different tasks (see Chapter 10). What are the limitations of research in this area? First, the finding that working memory capacity correlates highly with fluid intelligence means many findings ascribed to individual differences in working memory capacity may actually reflect fluid intelligence. However, it can be argued that general executive functions relevant to working memory capacity partially explain individual differences in fluid intelligence (Kovacs & Conway, 2016). Second, research on working memory capacity is somewhat narrowly based on behavioural research with healthy participants. In contrast, the unity/diversity framework (discussed next) has been strongly influenced by neuroimaging and genetic research and by research on brain-damaged patients. Third, there is a lack of conceptual clarity. For example, theorists differ as to whether the most important factor differentiating individuals with high- or low-capacity is “maintenance of task goals”, “resolution of conflict”, “executive attention” or “cognitive control”. We do not know how closely related these terms are. Fourth, the inferior attentional or cognitive control of low-capacity individuals might manifest itself consistently throughout task performance or only sporadically. Relatively little research (e.g., Unsworth et  al., 2012) has investigated this issue. Fifth, the emphasis in theory and research has been on the benefits for task performance associated with having high working memory capacity. However, some costs are associated with high capacity. These costs are manifest when the current task requires a broad focus of attention but high-capacity individuals adopt a narrow and inflexible focus (e.g., DeCaro et al., 2016, 2017; see Chapter 12).

KEY TERMS Executive functions Processes that organise and coordinate the workings of the cognitive system to achieve current goals; key executive functions include inhibiting dominant responses, shifting attention and updating information in working memory.

Executive functions: unity/diversity framework Executive functions are “high-level processes that, through their influ-

ence on lower-level processes, enable individuals to regulate their thoughts and actions during goal-directed behaviour” (Friedman & Miyake, 2017, p.  186). The crucial issue is to identify the number and nature of these ­executive functions or processes. Various approaches can address this issue: (1) Psychometric approach: several tasks requiring the use of executive functions are administered and the pattern of inter-correlations among the tasks is assessed. Consider the following hypothetical example. There are four executive tasks (A, B, C and D). There is a moderate positive correlation between tasks A and B and between C

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 257

28/02/20 4:20 PM

258 Memory

KEY TERMS Stroop task A task in which participants have to name the ink colours in which colour words are printed; performance is slowed when the to-be-named colour (e.g., green) conflicts with the colour word (e.g., red).

and D but the remaining correlations are small. Such a pattern suggests tasks A and B involve the same executive function whereas tasks C and D involve a different executive function. (2) Neuropsychological approach: the focus is on individuals with brain damage causing impaired executive functioning. Patterns of impaired functioning are related to the areas of brain damage to identify executive functions and their locations within the brain. Shallice and Cipiolotti (2018) provide a thorough discussion of the applicability of this approach to understanding executive functioning. (3) Neuroimaging approach: the focus is on assessing similarities and differences in the patterns of brain activation associated with various executive tasks. For example, the existence of two executive functions (A and B) would be supported if they were associated with different patterns of brain activation. (4) Genetic approach: twin studies are conducted with an emphasis on showing different sets of genes are associated with each executive function (assessed by using appropriate cognitive tasks). Several theories have been proposed on the basis of evidence using the above approaches (see Friedman and Miyake, 2017, for a review). Here we will focus on the very influential theory originally proposed by Miyake et al. (2000) and developed subsequently (e.g., Friedman & Miyake, 2017).

Unity/diversity framework

Interactive exercise: Stroop

Case study: Automatic processes, attention and the emotional Stroop effect

In their initial study, Miyake et al. (2000) used the psychometric approach: they administered several executive tasks and then focused on the pattern of inter-correlations among the tasks. They identified three related (but separable) executive functions: (1) Inhibition function: used to deliberately override dominant responses and to resist distraction. For example, it is used on the Stroop task (see Figure 1.3 on p. 5), which involves naming the colours in which words are printed. When the words are conflicting colour words (e.g., the word BLUE printed in red), it is necessary to inhibit saying the word. (2) Shifting function: used to switch flexibly between tasks or mental sets. Suppose you are presented with two numbers on each trial. Your task is to switch between multiplying the two numbers and dividing one by the other on alternate trials. Such task switching requires the shifting function. (3) Updating function: used to monitor and engage in rapid addition or deletion of working memory contents. For example, this function is used if you must keep track of the most recent member of each of several categories. Subsequent research (e.g., Friedman et  al., 2008; Miyake & Friedman, 2012) led to the development of the unity/diversity framework. The basic idea is that each executive function consists of what is common to all three executive functions (unity) plus what is unique to that function (diversity) (see Figure 6.9). After accounting for what was common to all executive

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 258

28/02/20 4:20 PM



Learning, memory and forgetting

259

Figure 6.9 Schematic representation of the unity and diversity of three executive functions (EFs). Each executive function is a combination of what is common to all three and what is specific to that executive function. The inhibition-specific component is absent because the inhibition function correlates very highly with the common executive function. From Miyake and Friedman (2012). Reprinted with permission of SAGE Publications.

functions, Friedman et al. found there was no unique variance left for the inhibition function. Of importance, separable shifting and updating factors have consistently been identified in subsequent research (Friedman & Miyake, 2017). What is the nature of the common factor? According to Friedman and Miyake (2017, p. 194), “It reflects individual differences in the ability to maintain and manage goals, and use those goals to bias ongoing processing.” Goal maintenance (resembling concentration) may be especially important on inhibition tasks where it is essential to focus on task requirements to avoid distraction or incorrect competing responses. This could explain why such tasks load only on the common factor. Support for the notion that the common factor reflects goal maintenance was reported by Gustavson et al. (2015). Everyday goal-­management failures (assessed by questionnaire) correlated negatively with the common factor.

Findings So far we have focused on the psychometric approach. The unity/­diversity framework is also supported by research using the genetic approach. Friedman et al. (2008) had monozygotic (identical) and dizygotic (fraternal) twins perform several executive function tasks. One key finding was that individual differences in all three executive functions (common; updating; shifting) were strongly influenced by genetic factors. Another key finding was that different sets of genes were associated with each function. We turn now to neuroimaging research. Such research partly supports the unity/diversity framework. Collette et  al. (2005) found all three of Miyake et  al.’s (2000) functions (i.e., inhibition; shifting; updating) were

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 259

28/02/20 4:20 PM

260 Memory

Figure 6.10 Activated brain regions across all executive functions in a meta-analysis of 193 studies (shown in red). From Niendam et al. (2012).

associated with activation in different prefrontal areas. However, all tasks produced activation in other areas (e.g., the left lateral prefrontal cortex, which is consistent with Miyake and Friedman’s (2012) unity notion). Niendam et  al. (2012) carried out a meta-analysis (see Glossary) of findings from 193 studies where participants performed many tasks involving executive functions. Of most importance, several brain areas were activated across all executive functions (see Figure 6.10). These areas included the dorsolateral prefrontal cortex (BA9/46), fronto-polar cortex (BA10), orbitofrontal cortex (BA11) and anterior cingulate (BA32). This brain network corresponds closely to the common factor identified by Miyake and Friedman (2012). In addition, Niendam et al. found some differences in activated brain areas between shifting and inhibition function tasks. Stuss and Alexander (2007) argued the notion of a dysexecutive syndrome (see Glossary; discussed earlier, p. 251) erroneously implies brain damage to the frontal lobes damages all central executive functions. While there may be a global dysexecutive syndrome in patients having widespread damage to the frontal lobes, this is not so in patients having limited prefrontal damage. Among such patients, Stuss and Alexander identified three executive processes, each associated with a different region within the frontal cortex (approximate brain locations are in brackets): (1) Task setting (left lateral): this involves planning; it is “the ability to set a stimulus-response relationship . . . necessary in the early stages of learning to drive a car or planning a wedding” (p. 906).

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 260

28/02/20 4:20 PM



Learning, memory and forgetting

261

(2) Monitoring (right lateral): this involves checking the adequacy of one’s task performance; deficient monitoring leads to increased variability of performance and increased errors. (3) Energisation (superior medial): this involves sustained attention or concentration; deficient energisation leads to slow performance on all tasks requiring fast responding. The above three executive processes are often used in combination when someone performs a complex task. Note that these three processes differ from those identified by Miyake et  al. (2000). However, there is some overlap: task setting and monitoring both involve aspects of cognitive control as do the processes of inhibition and shifting. Stuss (2011) confirmed the importance of the above three executive functions. In addition, he identified a fourth executive process he called metacognition/integration (located in BA10: fronto-polar prefrontal cortex). According to Stuss (p. 761), “This function is integrative and coordinating-orchestrating . . . [it includes] recognising the differences between what one knows from what one believes.” Evidence for this process has come from research on patients with damage to BA10 (Burgess et al., 2007).

Evaluation The unity/diversity framework provides a coherent account of the major executive functions and is deservedly highly influential. One of its greatest strengths is that it is supported by research using several different approaches (e.g., psychometric; genetic; neuroimaging; neuropsychological). The notion of a hierarchical system with one very general function (common executive function) plus more specific functions (e.g., shifting; updating) is consistent with most findings. What are the limitations of the unity/diversity framework? First, as Friedman and Miyake (2017, p. 199) admitted, “The results of lesion studies are in partial agreement with the unity/diversity framework . . . the processes [identified] in these studies are not clearly the same as those [identified] in studies of normal individual differences.” For example, Stuss (2011) obtained evidence for task setting, monitoring, energisation and metacognition/integration functions in research on brain-damaged patients. Second, many neuroimaging findings appear inconsistent with the framework. For example, Nee et  al. (2013) carried out a meta-analysis of 36 neuroimaging studies on executive processes. There was little evidence that functions such as shifting, updating and inhibition differed in their patterns of brain activation. Instead, one frontal region was mostly involved in processing spatial content (where-based processing) and a second frontal region was involved in processing non-spatial content (what-based processing). Third, Waris et al. (2017) also found evidence for content-based factors differing from the executive factors emphasised within the unity/diversity framework. They factor-analysed performance on ten working memory tasks and identified two specific content-based factors: (1) a visuo-spatial factor; and (2) a numerical-verbal factor. There is some overlap between these factors and those identified by Nee et al. (2013).

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 261

28/02/20 4:20 PM

262 Memory

Fourth, an important assumption within the unity/diversity framework is that all individuals have the same executive processes (Friedman & Miyake, 2017). The complexities and inconsistencies of the research evidence suggest this assumption may be only partially correct.

LEVELS OF PROCESSING (AND BEYOND)

Interactive exercise: Levels of processing

What determines long-term memory? According to Craik and Lockhart (1972), how information is processed during learning is crucial. In their levels-of-processing approach, they argued that attentional and perceptual processes of learning determine what information is stored in long-term memory. Levels of processing range from shallow or physical analysis of a stimulus (e.g., detecting specific letters in words) to deep or semantic analysis. The greater the extent to which meaning is processed, the deeper the level of processing. Here are Craik and Lockhart’s (1972) main theoretical assumptions: ●●

●●

The level or depth of stimulus processing has a large effect on its memorability: the levels-of-processing effect. Deeper levels of analysis produce more elaborate, longer-lasting and stronger memory traces than shallow levels.

Craik (2002) subsequently moved away from the notion that there is a series of processing levels going from perceptual to semantic. Instead, he argued that the richness or elaboration of encoding is crucial for long-term memory. Hundreds of studies support the levels-of-processing approach. For example, Craik and Tulving (1975) compared deep processing (decide whether each word fits the blank in a sentence) and shallow processing (decide whether each word is in uppercase or lowercase letters). Recognition memory was more than three times higher with deep than with shallow processing. Elaboration of processing (amount of processing of a given kind) was also important. Cued recall following the deep task was twice as high for words accompanying complex sentences (e.g., “The great bird swooped down and carried off the struggling ____”) as those accompanying simple sentences (e.g., “She cooked the ____”). Rose et  al. (2015) reported a levels-of-processing effect even with an apparently easy memory task: only a single word had to be recalled and the retention interval was only 10 seconds. More specifically, words associated with deep processing were better recalled than those associated with shallow processing when the retention interval was filled with a task involving adding or subtracting). Baddeley and Hitch (2017) pointed out the great majority of studies had used verbal materials (e.g., words). Accordingly, they decided to see whether a levels-of-processing effect would be obtained with different learning materials. In one study, they found the effect with recognition memory was much smaller with doors and clocks than with food names (see Figure 6.11). The most plausible explanation is that it is harder to produce an elaborate semantic encoding with doors or clocks than with most words.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 262

28/02/20 4:20 PM



263

Learning, memory and forgetting

p (correct)

Morris et  al. (1977) disproved the 1 levels-of-processing theory. Participants Shallow 0.9 answered semantic or shallow (rhyme) Deep 0.8 questions for words. Memory was tested 0.7 by a standard recognition test (select list 0.6 words and reject non-list words) or a 0.5 0.4 rhyming recognition test (select words 0.3 rhyming with list words – the list words 0.2 themselves were not presented). There 0.1 was the usual superiority of deep pro0 cessing on the standard recognition test. Doors Clocks Menus However, the opposite was the case on the rhyme test, a finding inconsistent with Figure 6.11 the theory. According to Morris et  al.’s Recognition memory performance as a function of processing transfer-appropriate processing theory, depth (shallow vs deep) for three types of stimuli: doors, clocks retrieval requires that the processing and menus. during learning is relevant to the demands From Baddeley and Hitch (2017). Reprinted with permission of Elsevier. of the memory test. With the rhyming test, rhyme information is relevant but sematic information is not. Challis et al. (1996) compared the levels-of-processing effect on explicit memory tests (e.g., recall; recognition) involving conscious recollection and on implicit memory tests not involving conscious recollection (see Chapter 7). The effect was generally greater in explicit than implicit memory. Parks (2013) explained this difference in terms of transfer-appropriate processing. Shallow processing involves more perceptual but less conceptual processing than deep processing. Accordingly, the levels-of-processing effect should generally be smaller when the memory task requires demanding ­perceptual processing (as is the case with most implicit memory tasks).

Distinctiveness Another important factor influencing long-term memory is distinctiveness. Distinctiveness means a memory trace differs from other memory traces because it was processed differently during learning. According to Hunt and Smith (2014, p. 45), distinctive processing is “the processing of difference in the context of similarity”. Eysenck and Eysenck (1980) studied distinctiveness using nouns having irregular pronunciations (e.g., comb has a silent “b”). In one condition, participants said these nouns in a distinctive way (e.g., pronouncing the “b” in comb). Thus, the processing was shallow (i.e., phonemic) but the memory traces were distinctive. Recognition memory was much higher than in a phonemic condition involving non-distinctive processing (i.e., pronouncing nouns as normal). Indeed, memory was as good with distinctive phonemic processing as with deep or semantic processing. How can we explain the beneficial effects of distinctiveness on longterm memory? Chee and Goh (2018) identified two potential explanations. First, distinctive items may attract additional attention and processing at the time of study. Second, distinctive items may be well remembered because of effects occurring at the time of retrieval, an explanation

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 263

KEY TERMS Explicit memory Memory that involves conscious recollection of information. Implicit memory Memory that does not depend on conscious recollection. Distinctiveness This characterises memory traces that are distinct or different from other memory traces stored in long-term memory.

28/02/20 4:20 PM

264 Memory

From Chee and Goh (2018). Reprinted with permission of Elsevier.

1 0.9 0.8 Proportion of recall

Figure 6.12 Percentage recall of the critical item (e.g., kiwi) in encoding, retrieval and control conditions; also shown is the percentage recall of preceding and following items in the three conditions.

0.7 0.6

Instruction type

0.5

Control

0.4

Encoding

0.3

Retrieval

0.2 0.1 0

Preceding

Critical item type

Following

originally proposed by Eysenck (1979). For example, suppose the distinctive item is printed in red whereas all the other items are printed in black. The retrieval cue (recall the red item) uniquely specifies one item and so facilitates retrieval. Chee and Goh (2018) contrasted the two above explanations. They presented a list of words referring to species of birds including the word kiwi. Of importance, kiwi is a homograph (two words having the same spelling but two different meanings): it can mean a species of bird or a type of fruit. Participants were instructed before study (encoding condition) or after study (retrieval condition) that one of the words would be a type of fruit. The findings are shown in Figure 6.12. A distinctiveness effect was found in the retrieval condition in the absence of distinctive processing at study. These findings strongly support a retrieval-based explanation of the distinctiveness effect.

Evaluation There is compelling evidence that processes at learning have a major impact on subsequent long-term memory (Roediger, 2008). Another strength of the theory is the central assumption that learning and remembering are byproducts of perception, attention and comprehension. The levels-of-­processing approach led to the identification of elaboration and distinctiveness of processing as important factors in learning and memory. Finally, “The levels-of-processing approach has been fruitful and generative, providing a powerful set of experimental techniques for exploring the phenomena of memory” (Roediger & Gallo, 2001, p. 44). The levels-of-processing approach has several limitations. First, Craik and Lockhart (1972) underestimated the importance of the retrieval environment in determining memory performance (e.g., Morris et  al., 1977). Second, the relative importance of processing depth, elaboration of processing and distinctiveness of processing to long-term memory remains unclear. Third, the terms “depth”, “elaboration” and “distinctiveness” are vague and hard to measure (Roediger & Gallo, 2001). Fourth, we do not know

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 264

28/02/20 4:20 PM



265

Learning, memory and forgetting

precisely why deep processing is so effective or why the l­evels-of-processing effect is small in implicit memory. Fifth, the levels-of-processing effect is typically smaller with non-verbal stimuli than with words (Baddeley & Hitch, 2017).

LEARNING THROUGH RETRIEVAL How can we maximise our learning (e.g., of some topic in cognitive psychology)? Many people (including you?) think what is required is to study and re-study the to-be-learned material with testing serving only to establish what has been learned. In fact, this is not the case. As we will see, there is typically a testing effect: “the finding that intermediate retrieval practice between study and a final memory test can dramatically enhance final-test performance when compared with restudy trials” (Kliegl & Bäuml, 2016). The testing effect is generally surprisingly strong. Dunlosky et  al. (2013) discussed ten learning techniques including writing summaries, forming images of texts and generating explanations for stated facts, and found repeated testing was the most effective technique. Rowland (2014) carried out a meta-analysis: 81% of the findings were positive. Most of these studies were laboratory-based. Reassuringly, Schwieren et  al. (2017) found the magnitude of the testing effect was comparable in real-life ­contexts (teaching psychology) and laboratory conditions.

KEY TERM Testing effect The finding that longterm memory is enhanced when some of the learning period is devoted to retrieving to-be-learned information rather than simply studying it.

Explanations of the testing effect We start by identifying two important theoretical approaches to explaining the testing effect. First, several theorists have emphasised the importance of retrieval effort (Rowland, 2014). The core notion here is that the testing effect will be greater when the difficulty or effort involved in retrieval during the learning period is high rather than low. Why does increased retrieval effort have this beneficial effect? Several answers have been suggested. For example, there is the elaborative retrieval hypothesis, which is applicable to paired-associate learning (e.g., learning to associate the cue Chalk with the target Crayon). According to this hypothesis, “the act of retrieving a target from a cue activates cue-­relevant information that becomes incorporated with the successfully retrieved target, providing a more elaborate representation” (Carpenter & Yeung, 2017, p. 129). According to a more specific version of this hypothesis (the mediator effectiveness hypothesis), retrieval practice promotes the use of more effective mediators. In the above example, Board might be a mediator ­triggered by the cue Chalk. Rickard and Pan (2018) proposed a related (but more general) dual-memory theory. In essence, restudy causes the memory trace formed at initial study to be strengthened. Testing with feedback (which involves effort) also strengthens the memory trace formed at initial study. More importantly, it leads to the formation of a second memory trace (see Figure  6.13). The strength of this second memory trace probably depends on the amount of retrieval effort during testing. Thus, testing generally promotes superior memory to restudy because it promotes the acquisition of two memory traces for each item rather than one.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 265

28/02/20 4:20 PM

266 Memory Figure 6.13 (a) Restudy causes strengthening of the memory trace formed after initial study; (b) testing with feedback causes strengthening of the original memory trace; and (c) the formation of a second memory trace. t = the response threshold that must be exceeded for any given item to be retrieved on the final test.

(a) Study memory After initial study Restudy After training t

Strength

(b) Study memory After initial study Testing with feedback

From Rickard & Pan (2018).

After training t

Strength

(c) Test memory Testing with feedback

After training t

Strength

Second, there is the bifurcation model (bifurcation means division into two) proposed by Kornell et  al. (2011). According to this model, items successfully retrieved during testing practice are strengthened more than restudied items. However, the crucial assumption is that items not retrieved during testing practice are strengthened less than restudied items; indeed, their memory strength does not change. Thus, there should be ­circumstances in which the testing effect is reversed.

Findings Several findings indicate that the size of the testing effect depends on retrieval effort (probably because it leads to the formation of a strong second memory trace). Endres and Renkl (2015) asked participants to rate the mental effort they used during retrieval practice and restudying. They obtained a testing effect that disappeared when mental effort was controlled for statistically. As predicted, more effortful or difficult retrieval tests (e.g., free recall) typically led to a greater testing effect than easy retrieval tests (e.g., recognition memory) (Rowland, 2014). All these findings provide indirect support for the dual-memory theory. It seems reasonable to assume retrieval practice is more effortful and demanding when initial memory performance is low rather than high. As

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 266

28/02/20 4:20 PM



Learning, memory and forgetting

267

predicted, the testing effect is greater when initial memory performance was low in studies providing feedback (re-presentation of the learning materials) (Rowland, 2014). Suppose you are trying to learn the word pair wingu–cloud. You might try to link the words by using the mediator plane. When subsequently given the cue (wingu) and told to recall the target word (cloud), you might generate the sequence wingu–wing–cloud according to the mediator effectiveness hypothesis. Pyc and Rawson (2010) instructed participants to learn Swahili-English pairs (e.g., wingu–cloud). In one condition, each trial after the initial study trial involved only restudy. In the other condition (test-restudy), each trial after the initial study trial involved a cued recall test followed by restudy. Participants generated and reported mediators on the study and restudy trials. There were three recall conditions on the final memory test 1 week later: (1) cue only; (2) cue + the mediator generated during learning; (3) cue + prompt to try to generate the mediator. The findings were straightforward (see Figure 6.14(a)): (1)  Memory performance in the cue only condition replicated the basic testing effect. (2) Performance in the cue + mediator condition shows test-restudy participants generated more effective mediators than restudy-only participants. (3)  Test-restudy participants performed much better than restudy-only ones in the cue + prompt condition. Testrestudy participants remembered the mediators much better. Retrieving mediators was important for the test-­ restudy ­participants – their performance was poor when they failed to recall mediators. Pyc and Rawson (2012) developed the mediator effectiveness hypothesis. Participants were more likely to change their mediators during test-restudy practice than restudy-only practice. Of most importance, participants engaged in test-restudy practice were more likely to change their mediators following retrieval failure than retrieval success. Thus, retrieval practice allows people to evaluate the effectiveness of their mediators and to replace ineffective ones with effective ones. We turn now to the bifurcation model, the main theoretical approach predicting reversals of the testing effect. Support was

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 267

Figure 6.14 (a) Final recall for restudy-only and test-restudy group participants provided at test with cues (C), cues + the mediators generated during learning (CM) or cues plus prompts to recall their mediators (CMR). (b) Recall performance in the CMR group as a function of whether the mediators were or were not retrieved. From Pyc and Rawson (2010). © American Association for Advancement of Science. Reprinted with permission of AAAS.

28/02/20 4:20 PM

268 Memory

reported by Pastötter and Bäuml (2016). Participants had retrieval/testing or restudy practice for paired 100 associates during Session  1. In Session 2 (48 hours 90 ** later), Test 1 was immediately followed by feedback 80 70 (re-presentation of the word pairs) and 10 minutes 60 later by Test 2. *** 50 There was a testing effect on Test 1 but a reversed 40 testing effect on Test 2 (see Figure 6.15). According 30 to the bifurcation model, non-recalled items on Test Test 1 Test 2 1 should be weaker if previously subject to retrieval practice rather than restudy. Thus, they should benefit Retrieval practice Restudy practice less from feedback. That is precisely what happened (see Figure 6.15). Figure 6.15 Most research on the testing effect has involved Mean recall percentage in Session 2 on Test 1 the use of identical materials during both initial and (followed by feedback) and Test 2 10 minutes later final retrieval tests. For many purposes, however, we as function of retrieval practice (in blue) or restudy want retrieval to produce more general and flexible practice (in green) in Session 1. learning that transfers to related (but non-tested) From Pastötter & Bäuml (2016). information. Pan and Rickard (2018) found in a meta-analysis that retrieval practice on average has a moderately beneficial effect on transfer of learning. This was especially the case when retrieval practice involved elaborative feedback (e.g., extended and detailed feedback) than when only basic feedback (i.e., the correct answer) was provided. % Recall

Session 2

Evaluation The testing effect is strong and has been obtained with many different types of learning materials. Testing during learning has the advantage it can be used almost regardless of the nature of the to-be-learned material. Of importance, retrieval practice often produces learning that generalises or transfers to related (but non-tested) information. Testing has beneficial effects because it produces a more elaborate memory trace (elaborative retrieval hypothesis) or a second memory trace (dual-memory theory). However, testing can be ineffective if the studied material is not retrieved and there is no feedback (the bifurcation model). What are the limitations of theory and research in this area? (1) There are several ways retrieval practice might produce more elaborate memory traces (e.g., additional processing of external context; the production of more effective internal mediators). The precise form of such elaborate memory traces is hard to predict. (2) The dual-memory theory provides a powerful explanation of the testing effect. However, more research is required to demonstrate the conditions in which testing leads to the formation of a second memory trace differing from the memory trace formed during initial study. (3) The bifurcation model has received empirical support. However, it does not specify the underlying processes or mechanisms responsible for the reversed testing effect.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 268

28/02/20 4:20 PM



269

Learning, memory and forgetting

(4) The fact that the testing effect has been found with numerous types of learning material and testing conditions suggests that many different processes can produce that effect. Thus, currently prominent theories are probably applicable to only some findings.

IMPLICIT LEARNING

KEY TERM Implicit learning Learning complex information without conscious awareness of what has been learned.

Earlier in the chapter we discussed learning through retrieval and learning from the levels-of-processing perspective. In both cases, the emphasis was  on explicit learning: it generally makes substantial demands on attention and working memory and learners are aware of what they are learning. Can we learn something without an awareness of what we have learned? It sounds improbable. Even if we learned something without realising, it seems unlikely we would make much use of it. In fact, there is much evidence for implicit learning: “learning that occurs without full conscious awareness of the regularities contained in the learning material itself and/ or that learning has occurred” (Sævland & Norman, 2016, p. 1). As we will see, it is often assumed implicit learning differs from explicit learning in being less reliant on attention and working memory. We can also distinguish between implicit learning and implicit memory (memory not involving conscious recollection; discussed in Chapter 7). There can be implicit memory for information acquired through explicit learning if learners lose awareness of that information over time. There can also be explicit memory for information acquired through implicit learning if learners are provided with informative contextual cues when trying to remember that information. However, implicit learning is typically followed by implicit memory whereas explicit learning is followed by explicit memory. There is an important difference between research on implicit learning and implicit memory. Research on implicit learning mostly involves focusing on performance changes occurring over a lengthy sequence of learning trials. In contrast, research on implicit memory mostly involves one or a few learning trials and the emphasis is on the effects of various factors (e.g., retention interval; retrieval cues) on memory performance. In addition, research on implicit learning often uses fairly complex, novel tasks whereas much research on implicit memory uses simple, familiar stimulus materials. Reber (1993) made five assumptions concerning major differences between implicit and explicit learning (none established definitively): (1) Age independence: implicit learning is little influenced by age or developmental level. (2) IQ independence: performance on implicit tasks is relatively unaffected by IQ. (3) Robustness: implicit systems are relatively unaffected by disorders (e.g., amnesia) affecting explicit systems. (4) Low variability: there are smaller individual differences in implicit learning than explicit learning. (5) Commonality of process: implicit systems are common to most species.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 269

28/02/20 4:20 PM

270 Memory

Here we will briefly consider the first two assumptions (the third assumption is discussed later, p. 277). With respect to the first assumption, some studies have reported comparable implicit learning in older and young adults. However, implicit learning is mostly significantly impaired in older adults. How can we explain this deficit? Older adults generally have reduced volume of frontal cortex and the striatum, an area strongly associated with implicit learning (King et al., 2013a). With respect to the second assumption, Christou et  al. (2016) found on a visuo-motor task that the positive effects of high working memory capacity on task performance were due to explicit but not implicit learning. When the visuo-motor task was changed to reduce the possibility of explicit learning, high working memory capacity was unrelated to performance. Overall, intelligence is associated more strongly with explicit learning. However, the association between intelligence and implicit learning appears greater than predicted by Reber (1993).

IN THE REAL WORLD: SKILLED TYPISTS AND IMPLICIT LEARNING Millions of individuals have highly developed typing skills (e.g., the typical American student who touch types produces 70 words a minute) (Logan & Crump, 2009). Nevertheless, many expert typists find it hard to think exactly where the letters are on the keyboard. For example, the first author of this book has typed 8 million words for publication but has limited conscious awareness of the locations of most letters! This suggests expert typing relies heavily on implicit learning and memory. However, typing initially involves mostly explicit learning as typists learn to associate finger movements with specific letter keys. Snyder et al. (2014) studied college students averaging 11.4 years of typing practice. In the first experiment, typists saw a blank keyboard and were instructed to write the letters in their correct locations (see Figure 6.16). They located only 14.9 (57.3%) of the letters accurately. If you are a skilled typist, try this task before checking your answers (shown in Figure 6.22). Accurate identification of letters’ keyboard locations could occur because typists engage in simulated typing. In their second experiment, Snyder et al. (2014) found the ability to identify the keyboard locations of letters was reduced when simulated typing was prevented. Thus, explicit memory for letter locations is lower than 57%.

Figure 6.16 Schematic representation of a traditional keyboard. From Snyder et al. (2014). © 2011 Psychonomic Society. Reprinted with permission from Springer.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 270

28/02/20 4:20 PM



Learning, memory and forgetting

271

In a final experiment, Snyder et al. (2014) gave typists two hours’ training on the Dvorak keyboard, on which the letter locations differ from the traditional QWERTY keyboard. The ability to locate letters on the Dvorak and QWERTY keyboards was comparable. Thus, typists have no more explicit knowledge of letter locations on a keyboard after 11 years than after 2 hours! What is the nature of experienced typists’ implicit learning? Logan (2018) addressed this issue. Much of this learning involves forming associations between individual letters and finger movements. In addition, however, typists learn to treat each word as a single chunk or unit. As a result, they type words much faster than non-words containing the same number of letters. Thus, implicit learning occurs at both the word and letter levels (Logan, 2018). If experts rely on implicit learning and memory, we might predict performance impairments if they focused consciously on their actions. There is much support for this prediction. For example, Flegal and Anderson (2008) gave skilled golfers a putting task before and after they described their actions in detail. Their putting performance was markedly worse after describing their actions because conscious processes disrupted implicit ones.

Assessing implicit learning You might think it is easy to decide whether implicit learning has occurred – we simply ask participants after performing a task to indicate their conscious awareness of their learning. Implicit learning is shown if there is no such conscious awareness. Alas, individuals sometimes fail to report fully their conscious awareness of their learning (Shanks, 2010). For example, there is the “retrospective problem” (Shanks & St. John, 1994) – participants may be consciously aware of what they are learning at the time but have forgotten it when questioned subsequently. Shanks and St. John (1994) proposed two criteria (incompletely implemented in most research) for implicit learning to be demonstrated: (1) Information criterion: The information participants are asked to provide on the awareness test must be the information responsible for the improved performance. (2) Sensitivity criterion: “We must . . . show our test of awareness is sensitive to all of the relevant knowledge” (Shanks & St. John, 1994, p. 374). We may underestimate participants’ consciously accessible knowledge if we use an insensitive awareness test. When implicit learning studies fail to obtain significant evidence of explicit learning, researchers often (mistakenly) conclude there was no explicit learning. Consider research on contextual cueing: participants search for targets in visual displays and targets are detected increasingly rapidly (especially with repeated rather than random displays). Subsequently, participants see the repeating patterns and new random ones and indicate whether they have previously seen each one. Typically, participants fail to identify the repeating patterns significantly more often than the random ones. Such non-significant findings imply all task learning is implicit. Vadillo et  al. (2016) argued many of the above non-significant findings occurred because insufficiently large samples were used. In their review of 73 studies, 78.5% of awareness tests produced non-significant findings.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 271

28/02/20 4:20 PM

272 Memory

KEY TERMS Process-dissociation procedure On learning tasks, participants try to guess the next stimulus (inclusion condition) or avoid guessing the next stimulus accurately (exclusion condition); the difference between the two conditions indicates the amount of explicit learning. Serial reaction time task Participants on this task respond as rapidly as possible to stimuli typically presented in a repeating sequence; it is used to assess implicit learning.

Nevertheless, participants in 67% of the studies performed above chance (a highly significant finding). Thus, some explicit learning is involved in contextual cueing even though the opposite is often claimed. Finally, we consider the process-dissociation procedure. Suppose participants perform a task involving a repeating sequence of stimuli. They either guess the next stimulus (inclusion condition) or try to avoid guessing the next stimulus accurately (exclusion condition). If learning is wholly implicit, performance should be comparable in both conditions because participants would have no conscious access to relevant information. If it is partly or wholly explicit, performance should be better in the inclusion condition. The process-dissociation procedure is based on the assumption that the influence of implicit and explicit processes is unaffected by instructions (inclusion vs exclusion). However, Barth et  al. (2019) found explicit knowledge was less likely to influence performance in the exclusion than the inclusion condition. Such findings make it hard to interpret findings obtained using the process-dissociation procedure.

Findings The serial reaction time task has often been used to study implicit learning. On each trial, a stimulus appears at one of several locations on a computer screen and participants respond using the response key corresponding to its location. There is typically a complex, repeating sequence over trials but participants are not told this. Towards the end of the experiment, there is often a block of trials conforming to a novel sequence but the participants are not informed. Participants speed up over trials on the serial reaction time task but respond much more slowly during the novel sequence (Shanks, 2010). When questioned at the end of the experiment, participants usually claim no conscious awareness of a repeating sequence or pattern. However, participants sometimes have partial awareness of what they have learned. Wilkinson and Shanks (2004) gave participants 1,500 trials (15 blocks) or 4,500 trials (45 blocks) on the task and obtained strong sequence learning. This was followed by a test of explicit learning based on the process-­ dissociation procedure. Participants’ predictions were significantly better in the inclusion than exclusion condition (see Figure 6.17) indicating some conscious or explicit knowledge was acquired. In a similar study, Gaillard et al. (2009) obtained comparable findings and discovered conscious knowledge increased with practice. Haider et  al. (2011) argued the best way to assess whether learning is explicit or implicit is to use several measures of conscious awareness. They used a version of the serial reaction time task in which a colour word (the target) was written in ink of the same colour (congruent trials) or a different colour (incongruent trials). Participants responded to the colour word rather than the ink. There were six different coloured squares below the target word and participants pressed the coloured square corresponding to the colour word. The correct coloured square followed a regular sequence (1-6-4-2-3-5) but participants were not told this.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 272

28/02/20 4:20 PM



Learning, memory and forgetting

Haider et  al. (2011) found 34% of participants showed a sudden drop in reaction times at some point. They hypothesised these RT-drop participants were consciously aware of the regular sequence (explicit learning). The remaining 66% failed to show a sudden drop (the no-RT-drop participants) and were hypothesised to have engaged only in implicit learning (see Figure 6.18). Haider et  al. (2011) used the process-­ dissociation procedure to test the above hypotheses. The RT-drop participants performed well: 80% correct on inclusion trials vs 18% correct on exclusion trials, suggesting considerable explicit learning. In contrast, the no-RT-drop participants had comparably low performance on inclusion and exclusion trials indicating an absence of explicit learning. Finally, all participants described the training sequence (explicit task). Almost all (91%) of the RT-drop participants did this perfectly compared to 0% of the no-RT-drop participants. Thus, all the various findings supported Haider et al.’s hypotheses.

273

Figure 6.17 Mean number of completions (guessed locations) corresponding to the trained sequence (own) or the untrained sequence (other) in inclusion and exclusion conditions as a function of number of trials (15 vs 45 blocks). From Wilkinson and Shanks (2004). © 2004 American Psychological Association. Reproduced with permission.

Figure 6.18 Response times for participants showing a sudden drop in RTs (right-hand side) or not showing such a drop (left-hand side). The former group showed much greater learning than the latter group (especially on incongruent trials on which the colour word was in a different coloured ink). From Haider et al. (2011). Reprinted with permission from Elsevier.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 273

28/02/20 4:20 PM

274 Memory

If implicit learning does not require cognitively demanding processes (e.g., attention), people should be able to perform two implicit learning tasks simultaneously without interference. As predicted, Jiménez and Vázquez (2011) reported no interference when participants performed the serial reaction time task and a second implicit learning task. Many tasks involve a combination of implicit and explicit learning. Taylor et  al. (2014) used a visuo-motor adaptation task on which participants learned to point at a target that rotated 45 degrees counterclockwise. Participants initially indicated their aiming direction and then made a rapid reaching movement. The former provided a measure of explicit learning whereas the latter provided a measure of implicit learning. Thus, an advantage of this experimental approach is that it provides separate measures of explicit and implicit learning. Huberdeau et al. (2015) reviewed findings using the above visuo-motor adaptation task and drew two main conclusions. First, improved performance over trials depended on both implicit and explicit learning. Second, there was a progressive increase in implicit learning with practice, whereas most explicit learning occurred early in practice.

Cognitive neuroscience If implicit and explicit learning are genuinely different, they should be associated with different brain areas. Implicit learning has been linked to the striatum, which is part of the basal ganglia (see Figure 6.19). For example, Reiss et  al. (2005) found on the serial reaction time task that participants showing implicit learning had greater activation in the striatum than those not exhibiting implicit learning. In contrast, explicit learning and memory are typically associated with activation in the medial temporal lobes including the hippocampus (see Chapter 7). Since conscious awareness is most consistently associated with activation of the dorsolateral prefrontal cortex and the anterior cingulate (see Chapter 16), these areas should be more active during explicit than implicit learning. Relevant evidence was reported by Wessel et  al. (2012) using the serial reaction time task. Some participants showed clear evidence of explicit learning during training. A brain area centred on the right prefrontal cortex became much more active around the onset of explicit learning. In similar fashion, Lawson et  al. (2017) compared participants showing (or not showing) conscious awareness of a repeating pattern on the serial reaction time task. The fronto-parietal network was Figure 6.19 more activated for those showing conscious The striatum (which includes the caudate nucleus and the putamen) is of central importance in implicit learning. awareness.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 274

28/02/20 4:20 PM



275

Learning, memory and forgetting

It is often hard to establish the brain regions associated with implicit and explicit learning because learners often use both kinds of learning. Destrebecqz et  al. (2005) used the process-dissociation procedure (see Glossary) with the serial reaction time task to distinguish more clearly between the explicit and implicit components of learning. Striatum activation was associated with the implicit component whereas the prefrontal cortex and anterior cingulate were associated with the explicit component. Penhune and Steele (2012) proposed a model of motor sequence learning (see Figure 6.20). The striatum is involved in learning stimulus– response associations and motor chunking or organisation. The cerebellum is involved in producing an internal model to aid sequence performance and error correction. Finally, the motor cortex is involved in storing the learned motor sequence. Of importance, the involvement of each brain area varies across stages of learning. Evidence for the importance of the cerebellum in motor sequence learning was reported by Shimizu et  al. (2017) using transcranial direct current stimulation (tDCS; see Glossary) applied to the cerebellum. This

KEY TERM Striatum It forms part of the basal ganglia and is located in the upper part of the brainstem and the inferior part of the cerebral hemispheres.

Interactive feature: Primal Pictures’ 3D atlas of the brain

Figure 6.20 A model of motor sequence learning. The top panel shows the brain areas (PMC or M1 = primary motor cortex) and associated mechanisms involved in motor sequence learning. The bottom panel shows the changing involvement of different processing components (chunking, synchronisation, sequence ordering, error correction) in overall performance. Each component is colour-coded to its associated brain region. From Penhune and Steele (2012). Reprinted with permission of Elsevier.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 275

28/02/20 4:20 PM

276 Memory

stimulation influenced implicit learning (enhancing or impairing performance) as predicted theoretically. In spite of the above findings, there are many inconsistencies and complexities in the research literature (Reber, 2013). For example, Gheysen et  al. (2011) found the striatum contributed to explicit learning of motor sequences as well as implicit learning and the hippocampus is sometimes involved in implicit learning (Henke, 2010). Why are the findings inconsistent? First, there are numerous forms of implicit learning. As Reber (2013, p. 2029) argued, “We should expect to find implicit learning . . . whenever perception and/or actions are repeated so that processing comes to reflect the statistical structure of experience.” As a consequence, it is probable that implicit learning can involve several different brain networks. Second, we can regard, “the cerebellum, basal ganglia, and cortex as an integrated system” (Caligiore et al., 2017, p. 204). This system plays an important role in implicit and explicit learning. Third, as we have seen, there are large individual differences in learning strategies and the balance between implicit and explicit learning. These individual differences introduce complexity into the overall findings. Fourth, there are often changes in the involvement of implicit and explicit processes during learning. For example, Beukema and Verstynen (2018) focused on changes in the involvement of different brain regions during the acquisition of sequential motor skills (e.g., the skills acquired by typists). Explicit processes dependent on the medial temporal lobe (shown in magenta) were especially important early in learning whereas implicit processes dependent on the basal ganglia (shown in blue) became increasingly important later in learning (see Figure 6.21).

Figure 6.21 Sequential motor skill learning initially depends on the medial temporal lobe (MTL) including the hippocampus (shown in magenta) but subsequently depends more on the basal ganglia (BG) including the striatum (shown in blue). From Beukema and Verstynen, 2018).

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 276

28/02/20 4:20 PM



277

Learning, memory and forgetting

Brain-damaged patients Amnesic patients with damage to the medial temporal lobes often have intact performance on implicit-memory tests but are severely impaired on explicit-memory tests (see Chapter 7). If separate learning systems underlie implicit and explicit learning, we might expect amnesic patients to have intact implicit learning but impaired explicit learning. That pattern of fi ­ ndings has been reported several times. However, amnesic patients are often slower than healthy controls on implicit-learning tasks (Oudman et al., 2015). Earlier we discussed the hypothesis that the basal ganglia (especially the striatum) are of major importance in implicit learning. Patients with Parkinson’s disease (a progressive neurological disorder) have damage to this region. As predicted, Clark et  al. (2014) found in a meta-analytic review that patients with Parkinson’s disease typically exhibit impaired implicit learning on the serial reaction time task (see Chapter 7). However, Wilkinson et  al. (2009) found Parkinson’s patients also showed impaired explicit learning on that task. In a review, Marinelli et  al. (2017) found that Parkinson’s patients showed the greatest impairment in motor learning when the task required conscious processing resources (e.g., attention; cognitive strategies). Much additional research indicates Parkinson’s patients have impaired conscious processing (see Chapter 7). Siegert et al. (2008) found in a metaanalytic review that such patients exhibited consistently poorer performance than healthy controls on working memory tasks. Roussel et  al. (2017) found 80% of Parkinson’s patients have dysexecutive syndrome which involves general impairments in cognitive processing. In sum, findings from Parkinson’s patients provide only limited information concerning the distinction between implicit and explicit learning.

KEY TERM Parkinson’s disease A progressive disorder involving damage to the basal ganglia (including the striatum); the symptoms include muscle rigidity, limb tremor and mask-like facial expression.

Evaluation Research on implicit learning has several strengths (see also Chapter 7). First, the distinction between implicit and explicit learning has received

Figure 6.22 Percentages of experienced typists given an unfilled schematic keyboard (see Figure 6.16) who correctly located (top number), omitted (middle number) or misplaced (bottom number) each letter with respect to the standard keyboard. From Snyder et al. (2014). © 2011 Psychonomic Society. Reprinted with permission from Springer.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 277

28/02/20 4:20 PM

278 Memory

KEY TERM Savings method A measure of forgetting introduced by Ebbinghaus in which the number of trials for relearning is compared against the number for original learning.

considerable support from behavioural and neuroimaging studies on healthy individuals and from research on brain-damaged patients. Second, the basal ganglia (including the striatum) tend to be associated with implicit learning whereas the prefrontal cortex, anterior cingulate and medial temporal lobes are associated with explicit learning. There is accumulating evidence that complex brain networks are involved in implicit learning (e.g., Penhune & Steele, 2012). Third, given the deficiencies in assessing conscious awareness with any single measure, researchers are increasingly using several measures. Thankfully, different measures often provide comparable estimates of the extent of conscious awareness (e.g., Haider et al., 2011). Fourth, researchers increasingly reject the erroneous assumption that finding some evidence of explicit learning implies no implicit learning occurred. In fact, learning typically involves implicit and explicit aspects and the extent to which learners are consciously aware of what they are learning depends on individual differences and the stage of learning (e.g., Wessel et al., 2012). What are the limitations of research on implicit learning? (1) There is often a complex mixture of implicit and explicit learning, making it hard to determine the extent of implicit learning. (2) The processes underlying implicit and explicit learning interact in ways that remain unclear. (3) In order to show the existence of implicit learning we need to demonstrate that learning has occurred in the absence of conscious awareness. This is hard to do – we may fail to assess fully participants’ conscious awareness (Shanks, 2017). (4) The definition of implicit learning as learning occurring without conscious awareness is vague and underspecified, and so is applicable to numerous forms of learning having little in common with each other. It is probable that no current theory can account for the diverse forms of implicit learning.

FORGETTING FROM LONG-TERM MEMORY Hermann Ebbinghaus (1885/1913) studied forgetting from long-term memory in detail, using himself as the only participant (not recommended!). He initially learned lists of nonsense syllables lacking meaning and then relearned each list between 21 minutes and 31 days later. His basic measure of forgetting was the savings method – the reduction in the number of trials during relearning compared to original learning. Ebbinghaus found forgetting was very rapid over the first hour after learning but then slowed considerably (see Figure 6.23). Rubin and Wenzel (1996) found the same pattern when analysing numerous forgetting functions and argued a logarithmic function describes forgetting over time. In contrast, Averell and Heathcote (2011) argued for a power function. It is often assumed (mistakenly) that forgetting should always be avoided. Nørby (2015) identified three major functions served by forgetting:

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 278

28/02/20 4:20 PM



279

Learning, memory and forgetting Figure 6.23 Forgetting over time as indexed by reduced savings. Data from Ebbinghaus (1885/1913).

(1) It can enhance psychological well-being by reducing access to painful memories. (2) It is useful to forget outdated information (e.g., where your friends used to live) so it does not interfere with current information (e.g., where your friends live now). Richards and Frankland (2017) developed this argument. They argued a major purpose of memory is to enhance decision-making and this purpose is facilitated when we forget outdated information. (3) When trying to remember what we have read or heard, it is typically most useful to forget specific details and focus on the overall gist or message (see Box and Chapter 10).

IN THE REAL WORLD: IS PERFECT MEMORY USEFUL? What would it be like to have a perfect memory? Jorge Luis Borges (1964) answered this question in a story called “Funes the memorious”. After falling from a horse, Funes remembers everything that happens to him in full detail. This had several negative consequences. When he recalled the events of any given day, it took him an entire day to do so! He found it very hard to think because his mind was full of incredibly detailed information. Here is an example: Not only was it difficult for him to comprehend that the generic symbol dog embraces so many unlike individuals of diverse size and form; it bothered him that the dog at three fourteen (seen from the side) should have the same name as the dog at three fifteen (seen from the front). (p. 153)

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 279

28/02/20 4:20 PM

280 Memory

KEY TERM Synaesthesia The tendency for one sense modality to evoke another.

The closest real-life equivalent of Funes was a Russian called Solomon Shereshevskii. When he worked as a journalist, his editor noticed he could repeat everything said to him verbatim. The editor sent Shereshevskii (S) to see the psychologist Luria. He found S rapidly learned complex material (e.g., lists of over 100 digits) which he remembered perfectly (even in reverse order) several years later. According to Luria (1968), “There was no limit either to the capacity of S’s memory or to the durability of the traces he retained.” What was S’s secret? He had exceptional imagery and an amazing capacity for synaesthesia (the tendency for processing in one modality to evoke other sense modalities). For example, when hearing a tone, he said: “It looks like fireworks tinged with a pink-red hue.” Do you envy S’s memory powers? Ironically, his memory was so good it disrupted his everyday life. For example, this was his experience when hearing a prose passage: “Each word calls up images, they collide with one another, and the result is chaos.” His mind came to resemble “a junk heap of impressions”. His acute awareness of details meant he sometimes failed to recognise someone he knew if, for example, their facial colouring had altered because they had been on holiday. These memory limitations made it hard for him to live a normal life and he eventually ended up in an asylum.

Most forgetting studies focus on declarative or explicit memory involving conscious recollection (see Chapter 7). Forgetting is often slower in implicit than explicit memory. For example, Mitchell (2006) asked participants to identify pictures from fragments having seen some of them in an experiment 17 years previously. Performance was better with the previously seen pictures, providing evidence for very-long-term implicit memory. However, there was little explicit memory for the previous experiment. A 36-year-old male participant confessed, “I’m sorry – I don’t really remember this experiment at all.” Below we discuss major theories of forgetting. These theories are not mutually exclusive – they all identify factors jointly responsible for forgetting.

Decay Perhaps the simplest explanation for forgetting of long-term memories is decay, which involves “forgetting due to a gradual loss of the substrate of memory” (Hardt et  al., 2013, p. 111). More specifically, forgetting often occurs because of decay processes occurring within memory traces. In spite of its plausibility, decay has largely been ignored as an explanation of forgetting. Hardt et al. argued a decay process (operating mostly during sleep) removes numerous trivial memories we form every day. This decay process is especially active in the hippocampus (part of the medial temporal lobe involved in acquiring new memories; see Chapter 7). Forgetting can be due to decay or interference (discussed shortly). Sadeh et  al. (2016) assumed detailed memories (i.e., containing contextual information) are sufficiently complex to be relatively immune to interference

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 280

28/02/20 4:20 PM



281

Learning, memory and forgetting

from other memories. As a result, most forgetting of such memories should be due to decay. In contrast, weak memories (i.e., lacking contextual information) are very susceptible to interference and so forgetting of such memories should be primarily due to interference rather than decay. Sadeh et al.’s findings supported these assumptions. Thus, the role played by decay in forgetting depends on the nature of the underlying memory traces.

Interference theory Interference theory was the dominant approach to forgetting during much of the twentieth century. According to this theory, long-term memory is impaired by two forms of interference: (1) proactive ­interference  – ­disruption of memory by previous learning; (2) retroactive ­interference  – disruption of memory for previous by other learning or processing during the retention interval. Research using methods such as those shown in Figure 6.22 indicates proactive and retroactive interference are both maximal when two different responses are associated with the same stimulus.

KEY TERMS Proactive interference Disruption of memory by previous learning (often of similar material). Retroactive interference Disruption of memory for previously learned information by other learning or processing occurring during the retention interval.

Proactive interference Proactive interference typically involves competition between the correct response and an incorrect one. There is greater competition (and thus more interference) when the incorrect response is associated with the same stimulus as the correct response. Jacoby et al. (2001) found proactive interference was due much more to the strength of the incorrect response than the weakness of the correct response. Thus, it is hard to exclude incorrect responses from the retrieval process. More evidence for the importance of retrieval processes was reported by Bäuml and Kliegl (2013). They tested the hypothesis that proactive interference is often found because rememberers’ memory search is too

Figure 6.24 Methods of testing for proactive and retroactive interference.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 281

28/02/20 4:20 PM

282 Memory

broad, including material previously learned but currently irrelevant. In the remember (proactive interference) condition, three word lists were presented followed by free recall of the last one. In the forget condition, the same lists were presented but participants were told after the first two lists to forget them. Finally, there was a control (no proactive interference) condition where only one list was learned and tested. Participants in the control condition recalled 68% of the words compared to only 41% in the proactive interference condition. Crucially, participants in the forget condition recalled 68% of the words despite having learned two previous lists. The instruction to Figure 6.25 forget the first two lists allowed participants to Percentage of items recalled over time for the conditions: no limit their retrieval efforts to the third list. This proactive interference (PI), remember (proactive interference) interpretation was strengthened by the finding and forget (forget previous lists). that retrieval speed was comparable in  the From Bäuml & Kliegl (2013). Reprinted with permission of Elsevier. forget and control conditions (see Figure 6.25). Kliegl et  al. (2015) found in a similar study that impaired encoding (see Glossary) contributes to proactive interference. Encoding was assessed using electroencephalography (EEG; see Glossary). The EEG indicated there was reduced attention during encoding of a word list preceded by other word lists (proactive interference condition). As in the study by Bäuml and Kliegl (2013), there was also evidence that proactive interference impaired retrieval. Suppose participants learn word pairs on the first list (e.g., Cat–Dirt) and more word pairs on the second list (e.g., Cat–Tree). They are then given the first words (e.g., Cat) and must recall the paired word from the second list (see Figure 6.24). Jacoby et  al. (2015) argued proactive interference (e.g., recalling Dirt instead of Tree) often occurs when participants often fail to recognise changes in the word pairings between lists. As predicted, when they instructed some participants to detect changed pairs, there was proactive facilitation rather than interference. Thus, proactive interference can be reduced (or even reversed) if we recollect the changes between information learned originally and subsequently.

Retroactive interference Anecdotal evidence that retroactive interference can be important in everyday life comes from travellers claiming exposure to a foreign language reduces their ability to recall words in their own language. Misra et  al. (2012) studied bilinguals whose native language was Chinese and second language was English. They named pictures in Chinese more slowly after previously naming the same pictures in English. The evidence from event-­ related potentials suggested participants were inhibiting second-­language names when naming pictures in Chinese.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 282

28/02/20 4:20 PM



Learning, memory and forgetting

283

As discussed earlier, Jacoby et al. (2015) found evidence for proactive facilitation rather than interference when participants explicitly focused on changes between the first and second lists (e.g., Cat–Dirt and Cat–Tree). Jacoby et  al. also found that instructing participants to focus on changes between lists produced retroactive facilitation rather than interference. Focusing on changes made it easier for participants to discriminate accurately between list 1 responses (e.g., Dirt) and list 2 responses (e.g., Tree). Retroactive interference is generally greatest when the new learning resembles previous learning. However, Dewar et  al. (2007) obtained evidence of retroactive interference for a word list when participants performed an unrelated task (e.g., detecting tones) between learning and memory test. Fatania and Mercer (2017) found children were more susceptible than adults to non-specific retroactive interference, perhaps because they used fewer effective strategies (e.g., rehearsal) to minimise such interference. In sum, retroactive interference can occur in two ways: (1) learning material similar to the original learning material; (2) distraction involving expenditure of mental effort during the retention interval (non-specific retroactive interference); this cause of retroactive interference is probably most common in everyday life. Retrieval problems play a major role in producing retroactive interference. Lustig et al. (2004) found that much retroactive interference occurs because people find it hard to avoid retrieving information from the wrong list. How can we reduce retrieval problems? Unsworth et al. (2013) obtained substantial retroactive interference when two word lists were presented prior to recall of the first list. When focused retrieval was made easier (the words in each list belonged to two separate categories such as animals and trees), there was no retroactive interference. Ecker et al. (2015) also tested recall of the first list following presentation of two word lists. When the time interval between lists was long rather than short, recall performance was better. Focusing retrieval on first-list words was easier when the two lists were more separated in time and thus more discriminable.

Evaluation There is convincing evidence for both proactive and retroactive interference, and progress has been made in identifying the underlying processes. Proactive and retroactive interference depend in part on problems with focusing retrieval exclusively on to-be-remembered information. Proactive interference also depends on impaired encoding of information. Both types of interference can be reduced by active strategies (e.g., focusing on changes between the two lists). What are the limitations of theory and research in this area? First, interference theory explains why forgetting occurs but does not explain why forgetting rate decreases over time. Second, we need clarification of the roles of impaired encoding and impaired retrieval in producing interference effects. For example, there may be interaction effects with impaired

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 283

28/02/20 4:20 PM

284 Memory

KEY TERMS Repression Motivated forgetting of traumatic or other threatening events (especially from childhood). Recovered memories Childhood traumatic memories forgotten for several years and then remembered in adult life.

encoding reducing the efficiency of retrieval. Third, the precise mechanisms responsible for the reduced interference effects with various strategies have not been identified.

Motivated forgetting Interest in motivated forgetting was triggered by the bearded Austrian psychologist Sigmund Freud (1856–1939). His approach was narrowly focused on repressed traumatic and other distressing memories. More recently, a broader approach to motivated forgetting has been adopted. Much information in long-term memory is outdated and useless for present purposes (e.g., where you have previously parked your car). Thus, motivated or intentional forgetting can be adaptive.

Repression Freud claimed threatening or traumatic memories often cannot gain access to conscious awareness: this serves to reduce anxiety. He used the term repression to refer to this phenomenon. He claimed childhood traumatic memories forgotten for many years are sometimes remembered in adult life. Freud found these recovered memories were often recalled during therapy. However, many experts (e.g., Loftus & Davis, 2006) argue most recovered memories are false memories referring to imaginary events. Relevant evidence concerning the truth of recovered memories was reported by Lief and Fetkewicz (1995). Of adult patients who admitted reporting false recovered memories, 80% had therapists who had made direct suggestions they had been subject to childhood sexual abuse. These findings suggest recovered memories recalled inside therapy are more likely to be false than those recalled outside. Geraerts et  al. (2007) obtained support for the above suggestion in a study on three adult groups who had suffered childhood sexual abuse: (1) Suggestive therapy group: their recovered memories were recalled initially inside therapy. (2) Spontaneous recovery group: their recovered memories were recalled initially outside therapy. (3) Continuous memory group: they had continuous memories of abuse from childhood onwards. Geraerts et  al. (2007) argued the genuineness of the memories produced could be assessed approximately by using corroborating evidence (e.g., the abuser had confessed). Such evidence was available for 45% of the continuous memory group and 37% of the outside therapy group but for 0% of the inside therapy group. These findings suggest recovered memories recalled outside therapy are much more likely to be genuine than those recalled inside therapy. Geraerts (2012) reviewed research comparing women whose recovered memories were recalled spontaneously or in therapy. Of importance, those with spontaneous recovered memories showed more ability to suppress unwanted memories and were more likely to forget they remembered

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 284

28/02/20 4:20 PM



285

Learning, memory and forgetting

something previously. Spontaneous recovery memories are often triggered by relevant retrieval cues (e.g., returning to the scene of the abuse). It seems surprising that women recovering memories outside therapy failed for many years to remember childhood sexual abuse. However, it is so only if the memories are traumatic (as Freud assumed). In fact, only 8% of women with recovered memories regarded the relevant events as traumatic or sexual when they occurred (Clancy & McNally, 2005/2006). The great majority described their memories as confusing or uncomfortable – it seems reasonable that confusing or uncomfortable memories could be ­suppressed or simply ignored or forgotten. In sum, many assumptions about recovered memories are false. As McNally and Geraerts (2009, p. 132) concluded, “A genuine recovered CSA [childhood sexual abuse] memory does not require repression, trauma, or even complete forgetting.”

KEY TERM Directed forgetting Reduced long-term memory caused by instructions to forget information that had been presented for learning.

Directed forgetting Directed forgetting is a phenomenon involving impaired long-term

memory triggered by instructions to forget information previously presented for learning. It is often studied using the item method: several words are presented, each followed immediately by an instruction to remember or forget it. After the words have been presented, participants are tested for recall or recognition memory of all the words. Memory performance is worse for the to-be-forgotten words than the to-be-remembered ones. What causes directed forgetting? The instructions cause learners to direct their rehearsal processes to to-be-remembered items at the expense of to-be-forgotten ones. Inhibitory processes are also involved. Successful forgetting is associated with activation in areas within the right frontal cortex involved in inhibition (Rizio & Dennis, 2013). Directed forgetting is often unsuccessful. Rizio and Dennis (2017) found 60% of items associated with forget instructions (Forget items) were successfully recognised compared to 73% for items associated with remember instructions (Remember items). They then considered brain activation for successfully recognised items associated with a feeling of remembering. There was greater activation in prefrontal areas associated with effortful processing for recognised Forget items than recognised Remember items. This enhanced effort was required because participants engaged in inhibitory processing of Forget items at encoding even if they were subsequently recognised.

Think/No-Think paradigm: suppression Anderson and Green (2001) developed the Think/No-Think paradigm to assess whether individuals can actively suppress memories. Participants learn a list of cue–target word pairs (e.g., Ordeal–Roach; Steam–Train). Then they receive the cues studied earlier (e.g., Ordeal; Steam) and try to recall the associated words (e.g., Roach; Train) (respond condition) or prevent them coming to mind (suppress condition). Some cues are not ­presented at this stage (baseline condition).

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 285

28/02/20 4:20 PM

286 Memory

Finally, there are two testing conditions. In the same-probe test condition, the original cues are presented (e.g., Ordeal) and participants recall the corresponding target words (e.g., Roach). In the independent-probe test condition, participants are presented with a novel category cue (e.g., Roach might be cued with Insect–r). If people can suppress unwanted memories, recall should be lower in the suppress than the respond condition. Recall should also be lower in the suppress condition than the baseline condition. Anderson and Huddleston (2012) carried out a meta-analysis of 47 Figure 6.26 experiments and found strong support for Percentage of words correctly recalled across 32 articles in both predictions (see Figure 6.26). However, the respond, baseline and suppress conditions (in that order, suppression attempts were often unsuccessful: reading from left to right) with same probe and independent in the suppress condition (same-probe test), probe testing conditions. 82% of items were recalled. From Anderson and Huddleston (2012). Reproduced with permission of Springer Science+Business Media. What strategies do individuals use to produce successful suppression of unwanted memories? Direct suppression (focusing on the cue word and blocking out the associated target word) is an important strategy. Thought substitution (associating a different non-target word with each cue word) is also very common. Bergström et  al. (2009) found these strategies were comparably effective in reducing recall in the suppress condition. Anderson et  al. (2016b) pointed out the Think/No-Think paradigm is unrealistic in that we rarely make deliberate efforts to retrieve suppressed memories in everyday life. They argued it would be more realistic to assess the involuntary or spontaneous retrieval of suppressed memories. They found suppression was even more effective than voluntary retrieval at reducing involuntary retrieval of such memories. How do suppress instructions cause forgetting? Anderson (e.g., Anderson & Huddleston, 2012) argues inhibitory control is important – the learned response to the cue word is inhibited. More specifically, he assumes inhibitory control involves the dorsolateral prefrontal cortex and other frontal areas. Prefrontal activation leads to reduced activation in the hippocampus (of central importance in learning and memory). There is much support for the above hypothesis. First, there is typically greater dorsolateral prefrontal activation during suppression attempts than retrieval but reduced hippocampal activation (Anderson et al., 2016b). Second, studies focusing on connectivity between the dorso­ lateral prefrontal cortex and hippocampus indicated the former influences the latter (Anderson et  al., 2016b). Third, individuals whose left and right hemisphere frontal areas involved in inhibitory control are most closely coordinated exhibit superior memory suppression (Smith et al., 2018).

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 286

28/02/20 4:20 PM



287

Learning, memory and forgetting

Evaluation Most individuals can actively suppress unwanted memories making them less likely to be recalled on purpose or involuntarily. Progress has been made in identifying the underlying mechanisms. Of most importance, inhibitory control mechanisms associated with the prefrontal cortex (especially the dorsolateral prefrontal cortex) often reduce hippocampal activation (Anderson et al., 2016b). What are the limitations of theory and research in this area? First, more research is required to clarify the reasons why suppression attempts are often unsuccessful. Second, the reduced recall typically obtained in the suppress condition is not always due exclusively to inhibitory processes. Some individuals use thought substitution, a strategy which reduces recall by producing interference or competition with the correct words (Bergström et al., 2009). However, del Prete et al. (2015) argued (with supporting evidence) that inhibitory processes play a part in explaining the successful use of thought substitution.

KEY TERM Encoding specificity principle The notion that retrieval depends on the overlap between the information available at retrieval and the information in the memory trace.

Cue-dependent forgetting We often attribute forgetting to the weakness of relevant memory traces. However, forgetting often occurs because we lack the appropriate retrieval cues (cue-dependent forgetting). For example, suppose you have forgotten the name of an acquaintance. If presented with four names, however, you might well recognise the correct one. Tulving (1979) argued that forgetting typically occurs when there is a poor match or fit between memory-trace information and information available at retrieval. This notion was expressed in his encoding ­specificity principle: “The probability of successful retrieval of the target item is a monotonically increasing function of informational overlap between the information present at retrieval and the information stored in memory” (p.  478). (If you are bewildered, note that a “monotonically increasing function” is one that generally rises and does not decrease at any point.) The encoding specificity principle resembles the notion of transfer-­ appropriate processing (Morris et  al., 1977; discussed earlier, see p. 263). The main difference is that the latter focuses more directly on the processes involved in memory. Tulving (1979) assumed that when we store information about an event, we also store information about its context. According to the encoding specificity principle, memory is better when the retrieval context is the same as that at learning. Note that context can be external (the environment in which learning and retrieval occur) or internal (e.g., mood state). Eysenck (1979) argued that long-term memory does not depend only on the match between information available at retrieval and stored information. The Endel Tulving. extent to which the retrieval information allows us Courtesy of Anders Gade.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 287

28/02/20 4:20 PM

288 Memory

to discriminate between the correct memory trace and incorrect ones also matters (discussed further below, see p. 298).

Findings Recognition memory is typically much better than recall (e.g., we can recognise names we cannot recall). However, it follows from the encoding specificity principle that recall will be better than recognition memory when information in the recall cue overlaps more than that in the recognition cue with memory-trace information. This surprising finding has been reported many times. For example, Muter (1978) found people were better at recalling famous names (e.g., author of the Sherlock Holmes stories: Sir Arthur Conan ___) than selecting the same names on a recognition test (e.g., DOYLE). Much research indicates the importance of context in determining forgetting. On the assumption that information about mood state (internal context) is often stored in the memory trace, there should be less forgetting when the mood state at learning and retrieval is the same rather than different. This phenomenon (mood-state-dependent memory) has often been reported (see Chapter 15). Godden and Baddeley (1975) manipulated external context. Divers learned words on a beach or 10 feet underwater and then recalled the words in the same or the other environment. Recall was much better in the same environment. However, Godden and Baddeley (1980) found no effect of context in a very similar experiment testing recognition memory rather than recall. This probably happened because the presence of the learned items on the recognition test provided powerful cues outweighing any impact of context. Bramão and Johansson (2017) found that having the same picture context at learning and retrieval enhanced memory for word pairs provided that each word pair was associated with a different picture context. However, having the same picture context at learning and retrieval impaired memory when each word pair was associated with the same picture context. In this condition, the picture context did not provide useful information specific to each of the word pairs being tested. The encoding specificity principle can be expressed in terms of brain activity: “Memory success varies as a function of neural encoding patterns being reinstated at retrieval” (Staudigl et al., 2015, p. 5373). Several studies have supported the notion that neural reinstatement is important for memory success. For example, Wing et  al. (2015) presented scenes paired with matching verbal labels at encoding and asked participants to recall the scenes in detail when presented with the labels at retrieval. Recall performance was better when brain activity at encoding and retrieval was similar in the occipito-temporal cortex, which is involved in visual processing. Limitations on the predictive power of neural reinstatement were shown by Mallow et  al. (2015) in a study on trained memory experts learning the locations of 40 digits presented in a matrix. They turned the numbers into concrete objects, which were then mentally inserted into a memorised route. On average, they recalled 86% of the digits in the correct order. However, none of the main brain areas active during encoding was activated during recall: thus, there was remarkably little neural reinstatement.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 288

28/02/20 4:20 PM



Learning, memory and forgetting

289

This happened because the processes occurring during encoding were very different from (and much more complex than) those occurring at retrieval. Suppose you learn paired associates including park–grove and are later given the cue word park and asked to supply the target or response word (i.e., grove). The response words to the other paired associates are either associated with park (e.g., tree; bench; playground) or not associated. In the latter case, the cue is uniquely associated with the target word and so your task should be easier. There is high overload when a cue is associated with several response words and low overload when it is only associated with one response word. The target word is more distinctive when there is low overload (distinctiveness was discussed earlier in the chapter). Goh and Lu (2012) tested the above predictions. Encoding-retrieval overlap was manipulated by using three item types. There was maximal overlap when the same cue was presented at retrieval and learning (e.g., park–grove followed by park–???); this was an intra-list cue. There was moderate overlap when the cue was a strong associate of the target word (e.g., airplane–bird followed by feather–???). Finally, there was little overlap when the cue was a weak associate of the target word e.g., roof–tin followed by armour–???). As predicted from the encoding specificity principle, encoding-­retrieval overlap was important (see Figure 6.27). However, cue overload was also important – memory performance was much better when each cue was uniquely associated with a single response word. According to the encoding specificity principle, memory performance should be best when ­encoding-retrieval overlap is highest (i.e., with intra-list cues). However, that was not the case with high overload.

Evaluation Tulving’s approach based on the encoding specificity principle has several strengths. The overlap between memory-trace information and that available in retrieval cues often determines retrieval success. The principle has also received some support from neuroimaging studies and research on mood-state-­ dependent memory (see Chapter 15). The notion that contextual information (external and internal) strongly influences memory performance has proved correct. What are the limitations with Tulving’s approach? First, he exaggerated the importance of encoding-retrieval overlap as the major factor determining remembering and forgetting. Remembering typically involves rejecting incorrect items as well as selecting correct ones. For this purpose, a cue’s ability to discriminate among memory traces is important (Bramão & Johansson, 2017; Eysenck, 1979; Goh & Lu, 2012).

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 289

Figure 6.27 Proportion of words recalled in high- and low-overload conditions with intra-list cues, strong extra-list cues and weak extra-list cues. From Goh and Lu (2012). © 2011 Psychonomic Society, Inc. Reprinted with the permission of Springer.

28/02/20 4:20 PM

290 Memory

KEY TERMS Consolidation A basic process within the brain involved in establishing long-term memories; this process lasts several hours or more and newly formed memories are fragile. Retrograde amnesia Impaired ability of amnesic patients to remember information and events from the time period prior to the onset of amnesia.

Second, neural reinstatement of encoding brain activity at retrieval is sometimes far less important than implied by the encoding specificity principle. This is especially the case when the processes at retrieval are very different from those used at encoding (e.g., Mallow et al., 2015). Third, Tulving’s assumption that retrieval-cue information is compared directly with memory-trace information is oversimplified. For example, you would probably use complex problem-solving strategies to answer the question, “What did you do six days ago?”. Remembering is a more dynamic, reconstructive process than implied by Tulving (Nairne, 2015a). Fourth, as Nairne (2015a, p. 128) pointed out, “Each of us regularly encounters events that ‘match’ prior episodes in our lives . . . but few of these events yield instances of conscious recollection.” Thus, we experience less conscious recollection than implied by the encoding specificity principle. Fifth, it is not very clear from the encoding specificity principle why context effects are often greater on recall than recognition memory (e.g., Godden & Baddeley, 1975, 1980). Sixth, memory allegedly depends on “informational overlap” between memory trace and retrieval environment, but this is rarely assessed. Inferring the amount of informational overlap from memory performance is circular reasoning.

Consolidation and reconsolidation The theories discussed so far identify factors that cause forgetting, but do not indicate clearly why the rate of forgetting decreases over time. The answer may lie in consolidation. According to this theory, consolidation “refers to the process by which a temporary, labile memory is transformed into a more stable, long-lasting form” (Squire et al., 2015, p. 1). According to the standard theory, episodic memories are initially dependent on the hippocampus. However, during the process of consolidation, these memories are stored within cortical networks. This theory is oversimplified: the process of consolidation involves bidirectional interactions between the hippocampus and the cortex (Albo & Gräff, 2018). The key assumption of consolidation theory is that recently formed memories are still being consolidated and so are especially vulnerable to interference and forgetting. Thus, “New memories are clear but fragile and old ones are faded but robust” (Wixted, 2004, p. 265).

Findings Much research supports consolidation theory. First, the decreased rate of forgetting typically found over time can be explained by assuming recent memories are more vulnerable than older ones due to an ongoing consolidation process. Second, there is research on retrograde amnesia, which involves impaired memory for events occurring before amnesia onset. As predicted by consolidation theory, patients with damage to the hippocampus often show greatest forgetting for memories formed shortly before amnesia onset and least for more remote memories (e.g., Manns et  al., 2003). However, the findings are somewhat mixed (see Chapter 7).

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 290

28/02/20 4:20 PM



291

Learning, memory and forgetting

Squire et  al. (1975) assessed recognition memory before and after patients were given electroconvulsive therapy. Electroconvulsive therapy reduced their memory for programmes up to 3 years beforehand from 65% to 42% but had no effect on memories acquired 4 to 17 years earlier. Third, individuals who drink excessively sometimes experience “blackouts” (an almost total loss of memory for events occurring while drunk). These blackouts probably indicate a failure to consolidate memories formed while intoxicated. As predicted, Moulton et  al. (2005) found long-term memory was impaired in individuals who drank alcohol shortly before learning. However, alcohol consumption shortly after learning led to improved memory. Alcohol may inhibit the subsequent formation of new memories that would interfere with the consolidation process of memories formed just before alcohol consumption. Fourth, consolidation theory predicts newly formed memories are more susceptible to retroactive interference than older ones. There is some support for this prediction when the interfering material is dissimilar to that in the first learning task (Wixted, 2004). Fifth, consolidation processes during sleep can enhance long-term memory (Paller, 2017). Consider a technique known as target memory reactivation: sleeping participants are exposed to auditory or olfactory cues (the latter relate to the sense of smell) present in the context where learning took place. This enhances memory consolidation by reactivating brain networks (including the hippocampus) involved in encoding new information and increases long-term memory (Schouten et al., 2017).

KEY TERM Reconsolidation This is a new process of consolidation occurring when a previously formed memory trace is reactivated; it allows that memory trace to be updated.

Reconsolidation Consolidation theory assumes memory traces are “fixated” because of a consolidation process. However, accumulating evidence indicates that is oversimplified. The current view is that consolidation involves progressive transformation of memory traces rather than simply fixation (Elsey et  al., 2018). Of most importance, reactivation of previously consolidated memory traces puts them back into a fragile state that can lead to those memory traces being modified (Elsey et al., 2018). Reactivation can lead to ­reconsolidation (a new consolidation process).

Findings Reconsolidation is very useful for updating our knowledge because previous learning is now irrelevant. However, it can impair memory for the information learned originally. This is how it happens. We learn some information at Time 1. At Time 2, we learn additional information. If the memory traces based on the information learned at Time 1 are activated at Time 2, they immediately become fragile. As a result, some information learned at Time 2 will mistakenly become incorporated into the memory traces of Time 1 information and thus cause misremembering. Here is a concrete example. Chan and LaPaglia (2013) had participants watch a movie about a fictional terrorist attack (original learning). Subsequently, some recalled 24 specific details from the move (e.g., a terrorist using a hypodermic syringe) to produce reconsolidation (reactivation)

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 291

28/02/20 4:20 PM

292 Memory

whereas others performed an irrelevant distractor task (no reactivation). After that, the participants encountered misinformation (e.g., the terrorist used a stun gun) or neutral information (relearning). Finally, there was a recognition-memory test for the information in the movie. What did Chan and LaPaglia (2013) find? Misinformation during the relearning phase led to substantial forgetting of information from the movie in the reactivation/reconsolidation condition but not the no-­reactivation condition. Reactivating memory traces from the movie triggered reconsolidation making those memory traces vulnerable to disruption from misinformation. In contrast, memory traces not subjected to reconsolidation were not disrupted. Scully et al. (2017) reported a meta-analytic review based on 34 experiments. As predicted, memory reactivation made memories susceptible to behavioural interference leading to impaired memory performance for the original learning event. These findings presumably reflect a reconsolidation process. However, the mean effect size was small and some studies (e.g., Hardwicke et al., 2016) failed to obtain significant effects.

Evaluation Consolidation theory explains why the rate of forgetting decreases over time. It also successfully predicts that retrograde amnesia is greater for recently formed memories and that retroactive interference effects are greatest when the interfering information is presented shortly after learning. Consolidation processes during sleep are important in promoting long-term memory and progress has been made in understanding the underlying processes (e.g., Vahdat et al., 2017). Reconsolidation theory helps to explain how memories are updated and no other theory can explain the range of phenomena associated with reconsolidation (Elsey et  al., 2018). It is a useful corrective to the excessive emphasis of consolidation theory on the permanent storage of memory traces. Reconsolidation may prove very useful in clinical contexts. For example, patients with post-traumatic stress disorder (PTSD) typically experience flashbacks (vivid re-experiencing of trauma-related events). There is preliminary evidence that reconsolidation can be used successfully in the treatment of PTSD (Elsey et al., 2018). What are the limitations of this theoretical approach? (1) Forgetting does not depend solely on consolidation but also depends on factors (e.g., encoding-retrieval overlap) not considered within the theory. (2) Consolidation theory does not explain why proactive and retroactive interference are greatest when two different responses are associated with the same stimulus. (3) Much remains to be done to bridge the gap between consolidation theory (with its focus on physical processes within the brain) and approaches to forgetting that emphasise cognitive processes. (4) Consolidation processes are very complex and only partially understood. For example, it has often been assumed that cortical networks become increasingly important during consolidation. In addition,

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 292

28/02/20 4:20 PM



Learning, memory and forgetting

293

however, consolidation is associated with a reorganisation within the hippocampus (Dandolo & Schwabe, 2018). (5) How memory retrieval makes consolidated memories vulnerable and susceptible to reconsolidation remains unclear (Bermúdez-Rattoni & McGaugh, 2017). (6) It has not always been possible to replicate reconsolidation effects. For example, Hardwicke et  al. (2016) conducted seven studies but found no evidence of reconsolidation. (7) Impaired memory performance for reactivated memory traces is typically explained as indicating that reconsolidation has disrupted storage of the original memory traces. However, it may also reflect problems with memory retrieval (Hardwicke et al., 2016).

CHAPTER SUMMARY •

Short-term vs long-term memory. The multi-store model assumes there are separate sensory, short-term and long-term stores. Much evidence (e.g., from amnesic patients) provides general support for the model, but it is greatly oversimplified. According to the unitary-store model, short-term memory is the temporarily activated part of long-term memory. That is partially correct. However, the crucial term “activation” is not precisely defined. In addition, research on amnesic patients and neuroimaging studies suggest the differences between short-term and long-term memory are greater than assumed by the unitary-store model.



Working memory. Baddeley’s original working memory model consisted of three components: an attention-like central executive, a phonological loop holding speech-based information, and a visuo-spatial sketchpad specialised for visual and spatial processing. However, there are doubts as to whether the visuospatial sketchpad is as separate from other cognitive processes and system as assumed theoretically. The importance of the central executive can be seen in brain-damaged patients whose central executive functioning is impaired (dysexecutive syndrome). The notions of a central executive and dysexecutive syndrome are oversimplified because they do not distinguish different executive functions. More recently, Baddeley added an episodic buffer that stores integrated information in multidimensional representations.



Working memory: executive functions and individual differences. Individuals high in working memory capacity have greater attentional control than low-capacity individuals, and so are more resistant to external and internal distracting information. There is a lack of conceptual clarity concerning the crucial differences between high- and low-capacity individuals, and potential costs associated with high capacity have rarely been investigated. According to the unity/diversity framework, research on executive functions indicates the existence of a common factor

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 293

28/02/20 4:20 PM

294 Memory

resembling concentration and two specific factors (shifting and updating). Support for this framework has been obtained from the psychometric, neuroimaging and genetic approaches. However, research on brain-damaged patients provides only partial support for the theoretical framework. •

Levels of processing. Craik and Lockhart (1972) focused on learning processes in their levels-of-processing theory. They identified depth of processing, elaboration of processing and distinctiveness of processing as key determinants of long-term memory. Insufficient attention was paid to the relationship between learning processes and those at retrieval and to the role of distinctive processing in enhancing long-term memory. The theory is not explanatory, and the reasons why depth of processing influences explicit memory much more than implicit memory remain unclear.



Learning through retrieval. Long-term memory is typically much better when much of the learning period is devoted to retrieval practice rather than study and the beneficial effects of retrieval practice extend to relevant but non-tested information. The testing effect is greater when it is hard to retrieve the to-beremembered information. Difficult retrieval probably enhances the generation and retrieval of effective mediators. There is a reversal of the testing effect when numerous items are not retrieved during testing practice; this reversal is explained by the bifurcation model.



Implicit learning. Behavioural findings support the distinction between implicit and explicit learning even though most measures of implicit learning are relatively insensitive. The brain areas activated during implicit learning (e.g., striatum) often differ from those activated during explicit learning (e.g., prefrontal cortex). However, complexities arise because there are numerous forms of implicit learning, and learning is often a mixture of implicit and explicit. Amnesic patients provide some support for the notion of implicit learning because they generally have less impairment of implicit than explicit learning. Parkinson’s patients with damage to the basal ganglia show the predicted impairment of implicit learning. However, they generally also show impaired explicit learning and so provide only limited information concerning the distinction between implicit and explicit learning.



Forgetting from long-term memory. Some forgetting from longterm memory is due to a decay process operating mostly during sleep. Strong proactive and retroactive interference effects have been found inside and outside the laboratory. People use active control processes to minimise proactive interference. Recovered

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 294

28/02/20 4:20 PM



Learning, memory and forgetting

295

memories of childhood abuse are more likely to be genuine when recalled outside (rather than inside) therapy. Memories can be deliberately suppressed with inhibitory control processes within the prefrontal cortex producing reduced hippocampal activation. Forgetting depends in part on encoding-retrieval overlap (encoding specificity principle). However, retrieval is often a more complex and dynamic process than implied by this principle. Consolidation theory explains the form of the forgetting curve but de-emphasises the role of cognitive processes. Reconsolidation theory explains how memories are updated and provides a useful corrective to consolidation theory’s excessive emphasis on permanent storage. However, the complex processes involved in consolidation and reconsolidation are poorly understood.

FURTHER READING Baddeley, A.D., Eysenck, M.W. & Anderson, M.C. (2020). Memory (3rd edn). Abingdon, Oxon.: Psychology Press. The main topics covered in this chapter are discussed in this textbook (for example, Chapters 8–10 are on theories of forgetting). Eysenck, M.W. & Groome, D. (eds) (2020). Forgetting: Explaining Memory Failure. London: Sage. This edited book focuses on causes of forgetting in numerous laboratory and real-life situations. Chapter 1 by David Groome and Michael Eysenck provides an overview of factors causing forgetting and a discussion of the potential benefits of forgetting. Friedman, N.P. & Miyake, A. (2017). Unity and diversity of executive functions: Individual differences as a window on cognitive structure. Cortex, 86, 186–204. Naomi Friedman and Akira Miyake provide an excellent review of our current understanding of the major executive functions. Karpicke, J.D. (2017). Retrieval-based learning: A decade of progress. In J. Wixted (ed.), Learning and Memory: A Comprehensive Reference (2nd edn; pp. 487–514). Amsterdam: Elsevier. Jeffrey Karpicke provides an up-to-date account of the testing effect and other forms of retrieval-based learning. Morey, C.C. (2018). The case against specialised visual-spatial short-term memory. Psychological Bulletin, 144, 849–883. Candice Morey discusses a considerable range of evidence apparently inconsistent with Baddeley’s working memory model (especially the visuo-spatial sketchpad). Norris, D. (2017). Short-term memory and long-term memory are still different. Psychological Bulletin, 143, 992–1009. Dennis Norris discusses much evidence supporting a clear-cut separation between short-term and long-term memory. Oberauer, K., Lewandowsky, S., Awh, E., Brown, G.D.A., Conway, A., Cowan, N., (2018). Benchmarks for models of short-term and working memory. Psychological Bulletin, 144, 885–958. This article provides an excellent account of the key findings relating to short-term and working memory that would need to be explained by any comprehensive theory. Shanks, D.R. (2017). Regressive research: The pitfalls of post hoc data selection in the study of unconscious mental processes. Psychonomic Bulletin & Review, 24, 752–775. David Shanks discusses problems involved in attempting to demonstrate the existence of implicit learning.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 295

28/02/20 4:20 PM

Chapter

7

Long-term memory systems INTRODUCTION We have an amazing variety of information stored in long-term memory (e.g., details of our last summer holiday; Paris is the capital of France; how to ride a bicycle). Much of this information is stored in schemas or organised packets of knowledge used extensively during language comprehension (see Chapter 10). This remarkable variety is inconsistent with Atkinson and Shiffrin’s (1968) notion of a single long-term memory store (see Chapter 6). More recently, there has been an emphasis on memory systems (note the plural!). Each memory system is distinct, having its own specialised brain areas and being involved in certain forms of learning and memory. Schacter and Tulving (1994) identified four memory systems: episodic memory; semantic memory; the perceptual representation system; and procedural memory. Since then, there has been a lively debate concerning the number and nature of long-term memory systems.

Amnesia Case study: Amnesia and long-term memory

KEY TERMS Amnesia A condition caused by brain damage in which there is severe impairment of long-term memory (mostly declarative memory). Korsakoff’s syndrome Amnesia (impaired longterm memory) caused by chronic alcoholism.

Suggestive evidence for several long-term memory systems comes from brain-damaged patients with amnesia. If you are a movie fan you may have mistaken ideas about the nature of amnesia (Baxendale, 2004). In the movies, serious head injuries typically cause characters to forget the past while still being fully able to engage in new learning. In the real world, however, new learning is typically greatly impaired as well. Bizarrely, many movies suggest the best cure for amnesia caused by severe head injury is to suffer another blow to the head! Approximately 40% of Americans believe a second blow to the head can restore memory in patients whose amnesia was caused by a previous blow (Spiers, 2016). Patients become amnesic for various reasons. Closed head injury is the most common cause. However, patients with closed head injury often have several other cognitive impairments making it hard to interpret their memory deficits. As a result, much research has focused on patients whose amnesia is due to chronic alcohol abuse (Korsakoff’s syndrome).

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 296

28/02/20 4:20 PM



Long-term memory systems

297

IN THE REAL WORLD: THE FAMOUS CASE OF HM HM (Henry Gustav Molaison) was the most-studied amnesic patient of all time. When he was 27, his epileptic condition was treated by surgery involving removal of his medial temporal lobes including the hippocampus. This affected his memory more dramatically than his general  cognitive functioning (e.g., IQ). Corkin (1984, p.  255) reported many years later that HM “does not know where he lives, who cares for him, or where he ate his last meal . . . in 1982 he did not recognise a picture of himself”. Research on HM (starting with Scoville and Milner, 1957) transformed our understanding of long-term memory in several ways (see Eichenbaum, 2015): (1)  Scoville and Milner’s article was “the origin of modern neuroscience research on memory” Henry Molaison, the most famous amnesic (Eichenbaum, 2015, p. 71). patient of all time. Research on him transformed our knowledge of the workings (2)  HM showed reasonable learning and long-term of long-term memory. retention on a mirror-tracing task (drawing objects seen only in reflection) (Corkin, 1968). He also showed learning on the pursuit rotor (manual tracking of a moving target) suggesting there is more than one long-term memory system. (3) HM had essentially intact short-term memory supporting the important distinction between short-term and long-term memory (see Chapter 6). (4) HM had generally good memory for events occurring a long time before his operation. This suggests memories are not stored permanently in the hippocampus. Research on HM led to an exaggerated emphasis on the role of the hippocampus in memory (Aggleton, 2013). His memory problems were greater than those experienced by the great majority of amnesic patients with hippocampal damage. This probably occurred mainly because surgery removed other areas (e.g., the parahippocampal region) and possibly because the anti-epileptic drugs used by HM damaged brain cells relevant to memory (Aggleton, 2013). The notion that HM’s brain damage exclusively affected his long-term memory for memories formed after his operation is oversimplified (Eichenbaum, 2015). Evidence suggests HM had various deficits in his perceptual and cognitive capacities. It also indicates he had impaired memory for public and personal events occurring prior to his operation. Thus, HM’s impairments were more widespread than generally assumed. In sum, we need to beware of “the myth of HM” (Aly & Ranganath, 2018, p. 1), which consists of two mistaken assumptions. First, while the hippocampus and medial temporal lobe are important in episodic memory (memory for personal events), episodic memory depends on a network that includes several other brain regions. For example, Vidal-Piñeiro et al. (2018) found that long-lasting episodic memories were associated with greater activation at encoding in inferior lateral parietal regions as well as the hippocampus. Second, the role of the hippocampus is not limited to memory. It also includes “other functions,  such as perception, working memory, and implicit memory [memory not involving conscious ­recollection]” (Aly & Ranganath, 2018, p. 1). This issue is discussed later (see pp. 332–336).

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 297

28/02/20 4:20 PM

298 Memory

Korsakoff patients are said to suffer from the “amnesic syndrome”:

KEY TERM Anterograde amnesia Reduced capacity for new learning (and subsequent remembering) after the onset of amnesia.

●●

●●

●●

●●

anterograde amnesia: a marked impairment in the ability to learn and

remember information encountered after the onset of amnesia; retrograde amnesia: problems in remembering events prior to amnesia onset (see Chapter 6); only slightly impaired short-term memory on measures such as digit span (repeating back a random string of digits); some remaining learning ability (e.g., motor skills).

The relationship between anterograde and retrograde amnesia is typically strong. Smith et al. (2013) obtained a correlation of +.81 between the two forms of amnesia in patients with damage to the medial temporal lobes. However, new learning is more easily disrupted by limited brain damage within the medial temporal lobes than is memory for previously acquired information. This probably occurs because there has typically been consolidation (see Glossary) of previously acquired information prior to amnesia onset. Further evidence the brain areas (and processes) underlying the two forms of amnesia differ was provided by Buckley and Mitchell (2016). Damage to the retrosplenial cortex (connected to the hippocampus) caused retrograde amnesia but not anterograde amnesia. There are problems with using Korsakoff patients. First, amnesia typically has a gradual onset caused by an increasing deficiency of the vitamin thiamine. Thus, it is often unclear whether certain past events occurred before or after amnesia onset. Second, brain damage in Korsakoff patients typically involves the medial temporal lobes (especially the hippocampus; see Figure 7.1). However, there is often damage to the frontal lobes as well producing various cognitive deficits (e.g., impaired cognitive control). This complicates interpreting findings from these patients. Third, the precise area of brain damage (and thus the pattern of memory impairment) varies across patients. For example, some Korsakoff patients exhibit confusion, lethargy and inattention.

Figure 7.1 Damage to brain areas within and close to the medial temporal lobes (indicated by asterisks) producing amnesia. Republished with permission of Routledge Publishing Inc.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 298

28/02/20 4:20 PM



Fourth, research on Korsakoff patients does not provide direct evidence concerning the impact of brain damage on long-term memory. Brain plasticity and learning of compensatory strategies mean patients can gradually alleviate some memory problems (Fama et al., 2012). In sum, the study of amnesic patients has triggered several theoretical developments. For example, the distinction between declarative and non-declarative memory (see below) was originally proposed in part because of findings from amnesic patients.

Declarative vs non-declarative memory Historically, the most important distinction between different types of long-term memory was between declarative memory and non-­declarative memory (Squire & Dede, 2015). Declarative memory involves conscious recollection of events and facts – it often refers to memories that can be “declared” or described but also includes memories that cannot be described verbally. Declarative memory is sometimes referred to as explicit memory and involves knowing that something is the case. The two main forms of declarative memory are episodic and semantic memory. Episodic memory is concerned with personal experiences of events that occurred in a given place at a specific time. Semantic memory consists of general knowledge about the world, concepts, language and so on. In contrast, non-declarative memory does not involve conscious recollection. We typically obtain evidence of non-declarative memory by observing changes in behaviour. For example, consider someone learning to ride a bicycle. Their cycling ability improves over time even though they cannot consciously recollect what they have learned. Non-declarative memory is sometimes known as implicit memory. There are various forms of non-declarative or implicit memory. One is memory for skills (e.g., piano playing; bicycle riding). Such memory involves knowing how to perform certain actions and is known as procedural memory. Another form of non-declarative memory is priming (also known as repetition priming): it involves facilitated processing of a stimulus presented recently (Squire & Dede, 2015, p. 7). For example, it is easier to identify a picture as a cat if a similar picture of a cat has been presented previously. The earlier picture is a prime facilitating processing when the second cat picture is presented. Amnesic patients find it much harder to form and remember declarative than non-declarative memories. For example, HM (discussed above) had extremely poor declarative memory for personal events occurring after his operation and for faces of those who became famous in recent decades. In stark contrast, he had reasonable learning ability and memory for non-declarative tasks (e.g., mirror tracing; the pursuit rotor; perceptual identification aided by priming). This chapter contains detailed discussion of declarative and non-­ declarative memory. Figure 7.2 presents the hugely influential traditional theoretical account, which strongly influenced most of the research discussed in this chapter. However, it is oversimplified. At the end of this chapter, we discuss its limitations and possible new theoretical developments

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 299

299

Long-term memory systems

KEY TERMS Declarative memory A form of long-term memory that involves knowing something is the case; it involves conscious recollection and includes memory for facts (semantic memory) and events (episodic memory); sometimes known as explicit memory. Non-declarative memory Forms of long-term memory that influence behaviour but do not involve conscious recollection (e.g., priming; procedural memory); also known as implicit memory. Procedural memory This is memory concerned with knowing how and it includes the knowledge required to perform skilled actions. Priming Facilitating the processing of (and response) to a target stimulus by presenting a stimulus related to it shortly beforehand. Repetition priming The finding that processing of a stimulus is facilitated if it has been processed previously.

28/02/20 4:20 PM

300 Memory

Figure 7.2 The traditional theoretical account based on dividing long-term memory into two broad classes: declarative and nondeclarative. Declarative memory is divided into episodic and semantic memory, whereas non-declarative memory is divided into procedural memory, priming, simple classical conditioning, and habituation and sensitisation. The assumption that there are several forms of long-term memory is accompanied by the further assumption that different brain regions are associated with each one. From Henke (2010). Reprinted with permission from Nature Publishing Group.

in the section entitled “Beyond memory systems and declarative vs non-­ declarative memory” (pp. 332–340).

DECLARATIVE MEMORY

KEY TERMS Episodic memory A form of long-term memory concerned with personal experiences or episodes occurring in a given place at a specific time. Semantic memory A form of long-term memory consisting of general knowledge about the world, concepts, language and so on.

Declarative or explicit memory encompasses numerous different kinds of memories. For example, we remember what we had for breakfast this morning and that “le petit déjeuner” is French for “breakfast”. Tulving (1972) argued the crucial distinction within declarative memory was between what he termed “episodic memory” and “semantic memory” (see Eysenck & Groome, 2015b). What is episodic memory? According to Tulving (2002, p. 5), “It makes possible mental time travel through subjective time from the present to the past, thus allowing one to re-experience . . . one’s own previous experiences.” Nairne (2015b) identified the three “Ws” of episodic memory: remembering a specific event (what) at a given time (when) in a particular place (where). What is semantic memory? It is “an individual’s store of knowledge about the world. The content of semantic memory is abstracted from actual experience and is therefore said to be conceptual, that is, generalised and without reference to any specific experience” (Binder & Desai, 2011, p. 527). What is the relationship between episodic memory and autobiographical memory (discussed in Chapter 8)? Both are concerned with personal past experiences. However, much information in episodic memory is trivial

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 300

28/02/20 4:21 PM



Long-term memory systems

301

and is remembered only briefly. In contrast, autobiographical memory typically stores information for long periods of time about personally significant events and experiences. What is the relationship between episodic and semantic memory? According to Tulving (2002), episodic memory developed out of semantic memory during the course of evolution. It also develops later in childhood than semantic memory.

Episodic vs semantic memory If episodic and semantic memory form separate memory systems, they should differ in several ways. Consider the ability of amnesic patients to acquire new episodic and semantic memories. Spiers et al. (2001) reviewed 147 cases of amnesia involving damage to the hippocampus or fornix. Episodic memory was impaired in all cases, whereas many patients had relatively small impairment of semantic memory. The above difference in the impact of hippocampal brain damage suggests episodic and semantic memory are distinctly different. However, the greater vulnerability of episodic memories than semantic ones may occur mainly because episodic memories are formed from a single experience whereas semantic memories often combine several learning opportunities. We would have stronger evidence if we discovered brain-damaged patients with very poor episodic memory but essentially intact semantic memory. Elward and Vargha-Khadem (2018) reviewed research on patients with developmental amnesia (amnesia due to hippocampal damage at a young age). These patients, “typically show relatively preserved semantic memory and factual knowledge about the natural world despite severe impairments in episodic memory” (p. 23). Vargha-Khadem et  al. (1997) studied two patients (Beth and Jon) with developmental amnesia. Both had very poor episodic memory for the day’s activities and television programmes, but their semantic memory (language development; literacy; and factual knowledge) were within the normal range. However, Jon had various problems with semantic memory (Gardiner et al., 2008). His rate of learning was slower than that of healthy controls when provided with facts concerning geographical, historical and other kinds of knowledge. Similarly slow learning in semantic memory has been found in most patients with developmental amnesia (Elward & Vargha-Khadem, 2018). The findings from patients with developmental amnesia are surprising given the typical finding that individuals with an intact hippocampus depend on it for semantic memory acquisition (Baddeley et  al., 2020). Why, then, is their semantic memory reasonably intact? Two answers have been proposed. First, developmental amnesics typically devote more time than healthy individuals to repeated study of factual information. This may produce durable long-term semantic memories via a process of consolidation (see Glossary and Chapter 6). Second, episodic memory may depend on the hippocampus whereas semantic memory depends on the underlying entorhinal, perirhinal and parahippocampal cortices. Note the brain damage suffered by Jon and Beth centred on the hippocampus. Bindschaedler et al. (2011) studied a boy (VI)

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 301

28/02/20 4:21 PM

302 Memory

with severe hippocampal damage but relatively preserved ­perirhinal and entorhinal cortex. His performance on semantic memory tasks (e.g., vocabulary) improved at the normal rate even though his performance was very poor on episodic memory tasks. Many amnesics may have severe problems with episodic and semantic memory because the hippocampus and underlying cortices are both damaged. This is very likely given the two areas are adjacent. Curot et  al. (2017) applied electrical brain stimulation to memory-­ related brain areas to elicit reminiscences. Semantic memories were mostly elicited by stimulation of the rhinal cortex (including the entorhinal and perirhinal cortices). In contrast, episodic memories were only elicited by stimulation of the hippocampal region. Blumenthal et  al. (2017) studied a female amnesic (HC) with severe hippocampal damage but intact perirhinal and entorhinal cortices. She was given the semantic memory task of generating intrinsic features of objects (e.g., shape; colour) and extrinsic features (e.g., how the object is used). HC performed comparably to controls with intrinsic features but significantly worse than controls with extrinsic features. Thus, the hippocampus is important for learning some aspects of semantic memory. How can we explain Blumenthal et  al.’s (2017) findings? The hippocampus is involved in learning associations between objects and contexts in episodic memory (see final section of the chapter, pp. 332–340). In a similar fashion, generating extrinsic features of objects requires learning associations between objects and their uses.

Retrograde amnesia We turn now to amnesic patients’ problems with remembering information learned prior to the onset of amnesia: retrograde amnesia (see Glossary and Chapter 6). Many amnesic patients have much greater retrograde amnesia for episodic than semantic memories. Consider the amnesic patient KC. According to Tulving (2002, p. 13), “He cannot recollect any personally experienced events . . ., whereas his semantic knowledge [e.g. general world knowledge] acquired before the critical accident is still reasonably intact.” There is much support for the notion that remote semantic memories formed prior to the onset of amnesia are essentially intact (see Chapter 6). For example, amnesic patients often perform comparably to healthy controls on semantic memory tasks (e.g., vocabulary knowledge; object naming). However, Klooster and Duff (2015) argued such findings may reflect the use of insensitive measures. In their study, Klooster and Duff gave amnesic patients the semantic memory task of listing features of common objects. On average, amnesic patients listed only 50% as many features as healthy controls. Retrograde amnesia for episodic memories in amnesic patients often spans several years and has a temporal gradient, i.e., older memories showing less impairment (Bayley et  al., 2006). In contrast, retrograde amnesia for semantic memories is generally small except for knowledge acquired shortly before amnesia onset (Manns et al., 2003). In sum, retrograde amnesia is typically greater for episodic than semantic memories. However, semantic memories can be subject to ­retrograde amnesia when assessed using sensitive measures.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 302

28/02/20 4:21 PM



303

Long-term memory systems

Semantic dementia

KEY TERMS

Patients with semantic dementia have severe loss of concept knowledge from semantic memory. However, their episodic memory and most cognitive functions (e.g., attention; non-verbal problem solving) are reasonably intact initially. Semantic dementia always involves degeneration of the anterior temporal lobes. Areas such as the perirhinal and entorhinal cortex are probably involved in the formation of semantic memories. In contrast, the anterior temporal lobes are where such memories are stored semi-­ permanently. However, episodic memory and executive functioning are ­reasonably intact in the early stages. Patients with semantic dementia have great problems accessing information about concepts stored in semantic memory (Lambon Ralph et  al., 2017). However, their performance on many episodic memory tasks is good (e.g., they have intact ability to reproduce complex visual designs: Irish et al., 2016). They also have comparable performance to healthy controls in remembering what tasks they performed 24 hours earlier and where those tasks were performed (Adlam et al., 2009). Landin-Romero et al. (2016) reviewed relevant research. The good episodic memory of semantic dementia patients probably occurs because they make effective use of the frontal and parietal regions within the brain. In sum, we have an apparent double dissociation (see Glossary). Amnesic patients have very poor episodic memory but often reasonably intact semantic memory. In contrast, patients with semantic dementia have very poor semantic memory but reasonably intact episodic memory. However, the double dissociation is only approximate and it is hard to interpret the somewhat complex findings.

Semantic dementia A condition involving damage to the anterior temporal lobes involving widespread loss of information about the meanings of words and concepts; however, episodic memory and executive functioning are reasonably intact initially. Personal semantics Aspects of one’s personal or autobiographical memory that combine elements of episodic memory and semantic memory.

Interdependence of episodic and semantic memory We have seen the assumption that there are separate episodic and semantic memory systems is oversimplified. Here we focus on the interdependence of episodic and semantic memory. In a study by Renoult et al. (2016), participants answered questions belonging to four categories: (1) unique events (e.g., “Did you drink coffee this morning?”); (2) general factual knowledge (e.g., “Do many people drink coffee?”); (3) autobiographical facts (e.g., “Do you drink coffee every day?”); and (4) repeated personal events (e.g., “Have you drunk coffee while shopping?”). Category 1 involves episodic memory and category 2 involves semantic memory. Categories 3 and 4 involve personal semantic memory (a combination of episodic and semantic memory). Renoult et al. (2016) used event-related potentials (ERPs; see Glossary) during retrieval for all four question categories. There were clear-cut ERP differences between categories 1 and 2. Of most importance, ERP patterns for category 3 and 4 questions were intermediate between those for categories 1 and 2 suggesting they required retrieval from both episodic and semantic memory. Tanguay et  al. (2018) reported similar findings. They interpreted the various findings with reference to personal semantics: aspects of autobiographical memory resembling semantic memory in being factual but also resembling episodic memory in being “idiosyncratically personal” (p. 65).

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 303

28/02/20 4:21 PM

304 Memory

KEY TERM Semanticisation The phenomenon of episodic memories changing into semantic memories over time.

Greenberg et al. (2009) showed episodic and semantic memory can be interdependent. Amnesic patients and healthy controls generated as many members as possible from various categories. Some categories (e.g., kitchen utensils) were selected so that performance would benefit from using episodic memory, whereas other categories (e.g., things typically red) seemed less likely to involve episodic memory. Amnesic patients performed worse than controls especially with categories potentially benefitting from episodic memory. With those categories, controls were much more likely than patients to use episodic memory as an efficient organisational strategy to generate category members.

Semanticisation of episodic memory Robin and Moscovitch (2017) argued initially episodic memories are transformed into semantic memories over time. For example, the first time you went to a seaside resort, you formed episodic memories of your experiences there. As an adult, while you still remember visiting that seaside resort as a child, you have probably forgotten the personal and contextual information originally associated with your childhood memories. Thus, what was an episodic memory has become a semantic memory. This change involves semanticisation of episodic memory and suggests episodic and semantic memories are related. Robin and Moscovitch (2017) argued the process of semanticisation often involves a memory transformation from an initially detail-rich episodic representation to a gist-like or schematic representation involving semantic memory. They provided a theoretical framework within which to understand these processes (see Figure 7.3). There is much support for this theoretical approach (discussed later). For example, Gilboa and Marlatte (2017) found in a meta-analytic review that the ventromedial prefrontal cortex is typically involved in schema processing within semantic memory. Sekeres et  al. (2016) tested memory for movie clips. There was much more forgetting of peripheral detail over time (episodic memory) than of the gist (semantic memory). St-Laurent et al. (2016) found amnesic patients with hippocampal damage had reduced processing of episodic perceptual details. Robin and Moscovitch (2017) discussed research focusing on changes in brain activation during recall as time since learning increased. As predicted, there was reduced anterior hippocampal activation but increased activation in the ventromedial prefrontal cortex. These findings reflected increased use of gist or schematic information compensating for reduced availability of details.

Overall evaluation There is some support for separate episodic and semantic memory systems in the double dissociation involving amnesia and semantic dementia: the former is associated with greater impairment of episodic than semantic memory whereas the latter is associated with the opposite pattern. However, there are complications in interpreting these findings and the

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 304

28/02/20 4:21 PM



305

Long-term memory systems

Particular, detailed cues: Cake at 10th birthday party

Generic cues: House

Party

Posterior neocortex

Particular, coarse cues: Mom’s house

Perceptual representations

10th birthday party

pHPC vmPFC Schema / monitoring

EL

O AB

TIO RA

N

Details CO N

STR

UC

aHPC TIO

N

Gist

A EL

BO

TIO RA

N

WEAK ELABORATION

Figure 7.3 Episodic memories (involving perceptual representations and specific details) depend on the posterior hippocampus (pHPC); semantic memories (involving schemas) depend on the ventromedial prefrontal cortex (vmPFC); and gist memories (combining episodic and semantic memory) depend on the anterior hippocampus (aHPC). There are interactions between these forms of memory caused by processes such as construction and elaboration. From Robin and Moscovitch (2017). Reprinted with permission of Elsevier.

double dissociation is only approximate. In addition, episodic and semantic memory are often interdependent at learning and during retrieval, making it hard to disentangle their respective contributions.

EPISODIC MEMORY How can we assess someone’s episodic memory following learning (e.g., a list of to-be-remembered items)? Recognition and recall are the two main types of episodic memory test. Recognition-memory tests generally involve presenting various items with participants deciding whether each one was presented previously (often 50% were presented previously and 50% were not). As we will see, more complex forms of recognition-memory test have also been used. There are three types of recall test: free recall, serial recall and cued recall. Free recall involves producing previously presented items in any order in the absence of specific cues. Serial recall involves producing previously presented items in the order they were presented. Cued recall involves producing previously presented items to relevant cues. For example, “cat–table” might be presented at learning and the cue, “cat–???” at test.

KEY TERMS Free recall A test of episodic memory in which previously presented to-be-remembered items are recalled in any order. Serial recall A test of episodic memory in which previously presented to-beremembered items must be recalled in the order of presentation. Cued recall A test of episodic memory in which previously presented to-be-remembered items are recalled in response to relevant cues.

306 Memory

Recognition memory: familiarity and recollection Recognition memory can involve recollection or familiarity. Recollection involves recognition based on conscious retrieval of contextual information whereas such conscious retrieval is lacking in familiarity-based recognition (Brandt et al., 2016). Here is a concrete example. Several years ago, the first author walked past a man in Wimbledon, and was immediately confident he recognised him. However, he simply could not think where he had previously seen the man. After some thought (this is the kind of thing academic psychologists think about!), he realised the man was a ticket-office clerk at Wimbledon railway station. Initial recognition based on familiarity was replaced by recognition based on recollection. The remember/know procedure (Migo et  al., 2012) has often been used to assess familiarity and recollection. List learning is followed by a test where participants indicate whether each item is “Old” or “New”. Items identified as “Old” are followed by a know or remember judgement. Typical instructions require participants to respond know if they recognise the list words, “but these words fail to evoke any specific conscious recollection from the study list” (Rajaram, 1993, p. 102). They should respond remember if “the ‘remembered’ word brings back to mind a particular association, image, or something more personal from the time of study” (Rajaram, 1993, p. 102). Dunn (2008) proposed a single-process account: strong memory traces give rise to recollection judgements whereas weak memory traces give rise to familiarity judgements. As we will see, however, most evidence supports a dual- or two-process account, namely, that recollection and familiarity involve different processes.

Brain mechanisms Diana et  al. (2007) provided an influential theoretical account of the key brain areas involved in recognition memory in their binding-of-item-andcontext model (see Figure 7.4): (1) The perirhinal cortex receives information about specific items (what information needed for familiarity judgements). (2) The parahippocampal cortex receives information about context (where information useful for recollection judgements). (3) The hippocampus receives what and where information (both of great importance to episodic memory) and binds them to form item–context associations permitting recollection.

Findings Functional neuroimaging studies support the above model. In a meta-­ analytic review, Diana et  al. (2007) found recollection was associated with more activation in parahippocampal cortex and the hippocampus than perirhinal cortex. In contrast, familiarity was associated with more activation in perirhinal cortex than the parahippocampal cortex or hippocampus.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 306

28/02/20 4:21 PM



Long-term memory systems

307 Figure 7.4 (a) Locations of the hippocampus (red), the perirhinal cortex (blue) and the parahippocampal cortex (green); (b) the binding-ofitem-and-context model. From Diana et al. (2007). Reprinted with permission of Oxford University Press.

Neuroimaging evidence is correlational and so cannot show the hippocampus is more essential to recollection than familiarity. In principle, more direct evidence could be obtained from brain-damaged patients. Bowles et  al. (2010) studied amnesic patients with severe hippocampal damage. As predicted, these patients had significantly impaired recollection but not familiarity. However, other research has typically found amnesic patients with medial temporal lobe damage have a minor impairment in familiarity but a much larger one in recollection (Skinner & Femandes, 2007). According to the model, patients with damage to the perirhinal cortex should have largely intact recollection but impaired familiarity. Bowles et al. (2011) tested this prediction with a female patient, NB. As predicted, her recollection performance was consistently intact. However, she had impaired familiarity for verbal materials. Brandt et  al. (2016) studied a female patient, MR, with selective damage to entorhinal cortex (adjacent to perirhinal cortex and previously linked to familiarity). As predicted, MR had impaired familiarity for words but intact recollection.

308 Memory

According to the original model, the parahippocampal cortex is limited to processing spatial context (i.e., where information). This is too limited. Diana (2017) used a non-spatial context – words were accompanied by contextual questions (e.g., “Is this word common or uncommon?”). There was greater parahippocampal activation for words associated with correct (rather than incorrect) context memory. Since the context (i.e., contextual questions) was non-spatial, the role of the parahippocampal cortex in episodic memory extends beyond spatial information. Dual-process models assume the hippocampus is required to process relationships between items and to bind items to contexts but is not required to process items in isolation. There are two potential problems with these assumptions (Bird, 2017). First, the term “item” is often imprecisely defined. Second, these models often de-emphasise the importance of the learning material (e.g., faces; names; pictures). Smith et  al. (2014) compared immediate memory performance in healthy controls and amnesic patients with hippocampal damage. Fifty famous faces were presented followed by a recognition-memory test. The amnesic patients performed comparably to controls for famous faces not identified as famous but were significantly impaired for famous faces i­dentified as famous. A plausible interpretation is that unfamiliar faces (i.e., unknown famous faces) are processed as isolated items and so do not  require hippocampal processing. In contrast, known famous faces benefit from additional contextual processing dependent on the hippocampus. Bird (2017, p. 161) concluded his research review as follows: “There are no clear-cut examples of materials other than [unfamiliar] faces that can be  recognised using extrahippocampal [outside the hippocampus] ­ familiarity processes.” This is because most “items” are not processed in isolation but require the integrative processing provided by the hippocampus. Scalici et  al. (2017) reviewed research on the involvement of the prefrontal cortex in familiarity and recollection. There was greater ­familiarity-based than recollection-based activity in the ventromedial and dorsomedial prefrontal cortex and lateral BA10 (at the front of the prefrontal cortex) whereas the opposite was the case in medial BA10 (see Figure 7.5). These findings suggest familiarity and recollection involve different processes.

Evaluation Recognition memory depends on rather separate processes of familiarity and recollection, as indicated by neuroimaging studies. However, the most convincing findings come from studying brain-damaged patients. A double dissociation has been obtained – some patients have reasonably intact familiarity but impaired recollection whereas a few patients exhibit the opposite pattern. What are the limitations of theory and research in this area? (1) The typical emphasis on recollection based on conscious awareness of contextual details is oversimplified because we can also have

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 308

28/02/20 4:21 PM



Long-term memory systems

309 Figure 7.5 Left lateral (A), medial (B) and anterior (C) views of prefrontal areas having greater activation to familiarity-based than recollection-based processes (in red) and areas showing the opposite pattern (in blue). From Scalici et al. (2017). Reprinted with permission of Elsevier.

Figure 7.6 Sample pictures on the recognition-memory test. The one on the left is high-contrast and easy to process whereas the one on the right is low-contrast and hard to process. From Geurten & Willems (2017). Reprinted with permission of Elsevier.

conscious awareness of having previously seen the target items themselves. Brainerd et  al. (2014) found a model assuming two types of recollection predicted behavioural data better than models assuming only one type of recollection. (2) Diana et al.’s (2007) model does not identify the processes u ­ nderlying familiarity judgements. However, it is often assumed that items on a recognition-memory test that are easy to process are judged to be familiar. Geurten and Willems (2017) tested this assumption using unfamiliar pictures. On the recognition-memory test, some pictures were presented with reduced contrast to reduce processing fluency (see Figure 7.6). As predicted, recognition-memory performance was better with high-­contrast than with low-contrast test pictures (70% vs 59%, respectively).

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 309

28/02/20 4:21 PM

310 Memory

(3) More brain mechanisms are involved in recognition memory than assumed by Diana et al. (2007). (4) The notion of an “item” requires more precise definition (Bird, 2017).

Recall memory Here we will consider briefly similarities and differences between recall (especially free recall: see Glossary) and recognition memory. Mickes et al. (2013) reported important similarities using the remember/know procedure with free recall. Participants received a word list and for each word answered one question (e.g., “Is this item animate?”; “Is this item bigger than a shoebox?”). They then recalled the words, made a remember or know judgement for each recalled word and indicated which question had been associated with each word (contextual information). Participants were more accurate at remembering which question was associated with recalled words when the words received remember (rather than know) judgements. This is very similar to recognition memory where participants access more contextual information for remember words than know ones. Kragel and Polyn (2016) compared patterns of brain activation during recognition-memory and free-recall tasks. Brain areas activated during familiarity processes in recognition memory were also activated during free recall. There was also weaker evidence that brain areas activated during recollective processes in recognition were activated in free recall. As we have seen, amnesic patients exhibit very poor recognition memory (especially recognition associated with recollection). Amnesic patients also typically have very poor free recall (e.g., Brooks & Baddeley, 1976). Some aspects of recognition memory depend on structures other than the hippocampus itself (Diana et  al., 2007). In contrast, it has typically been assumed the hippocampus is crucial for recall memory. Patal et  al. (2015) supported these assumptions in patients with relatively selective hippocampal damage. The extent of hippocampal damage in these patients was negatively correlated with their recall performance but uncorrelated with their recognition-memory performance. There are several similarities between the processes involved in recall and recognition. However, the to-be-remembered information is physically present on recognition tests but not recall tests. As a result, processing demands should generally be less with recognition. Chan et  al. (2017) obtained findings consistent with this analysis in patients with damage to the frontal lobes impairing higher-level cognitive processes. Individual differences in intelligence were strongly related to performance on recall tests but not recognition-memory tests. Thus, recall performance depends much more on higher-level cognitive processes.

Is episodic memory constructive? We use episodic memory to remember experienced past events. Most people believe the episodic memory system resembles a video recorder providing us with accurate and detailed information about past events (Simons & Chabris, 2011). In fact, “Episodic memory is . . . a fundamentally

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 310

28/02/20 4:21 PM



Long-term memory systems

311

constructive, rather than reproductive process that is prone to various kinds of errors and illusions” (Schacter & Addis, 2007, p. 773). For example, the constructive nature of episodic memory leads to distorted remembering of stories (Chapter 10) and to eyewitnesses producing inaccurate memories of crimes (Chapter 8). Why is episodic memory so error-prone? First, it would require massive processing to produce a semi-permanent record of all our experiences. Second, we typically want to access the gist or essence of our past experiences, omitting trivial details. Third, we often enrich our episodic memories when discussing our experiences with friends even when this produces memory errors (Dudai & Edelson, 2016; see Chapter 8). What are the functions of episodic memory (other than to remember past events)? First, we use episodic memory to imagine possible future scenarios and to plan the future (Madore et  al., 2016). Imagining the future (episodic simulation) is greatly facilitated by episodic memory’s flexible and constructive nature. According to Addis (2018), remembered and imagined events are both very similar, “simulations of experience from the same pool of experiential details” (p. 69). However, Schacter and Addis (2007) assumed in their constructive episodic simulation hypothesis that episodic simulation is more demanding than episodic memory retrieval because control processes are required to combine details from multiple episodes. Second, Madore et al. (2019) found episodic memory influences divergent creative thinking (thinking of unusual and creative uses for common objects). Creative thinking was associated with enhanced connectivity between brain areas linked to episodic processing and brain areas associated with cognitive control.

Findings The tendency to recall the gist of our previous experiences increases throughout childhood (Brainerd et al., 2008). More surprisingly, children’s greater focus on remembering gist as they become older often increases memory errors. Brainerd and Mojardin (1998) asked children to listen to sentences such as “The tea is hotter than the cocoa”. Subsequently, they decided whether test sentences had been presented previously in precisely that form. Sentences having the same meaning as an original sentence but different wording (e.g., “The cocoa is cooler than the tea”) were more likely to be falsely recognised by older children. We turn now to the hypothesis (Schacter & Addis, 2007; Addis, 2018) that imagining future events involves very similar processes to those involved in remembering past episodic events. On that hypothesis, brain areas important for episodic memory (e.g., the hippocampus) should also be activated when imagining future events. Benoit and Schacter (2015) reported supportive evidence. There were two key findings: (1) Several brain regions were activated both while imagining future events (episodic simulation) and during episodic-memory recollection (see Figure 7.7A). The overlapping areas included “the hippocampus and parahippocampal cortex within the medial temporal lobes” (Benoit & Schacter, 2015, p. 450).

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 311

28/02/20 4:21 PM

312 Memory

(2) As predicted, several brain areas were more strongly activated during episodic simulation than episodic memory retrieval (see Figure 7.7B). These included clusters in the dorsolateral prefrontal cortex and posterior inferior parietal lobes and clusters in the right medial temporal lobe (including the hippocampus) (Benoit & Schacter, 2015, p. 453). Some of these areas are involved in cognitive control – the borders of the fronto-parietal control network (see Chapter 6) are indicated by white dashed lines. Imagining future events is generally associated with hippocampal activation. We would have more direct evidence the hippocampus is necessarily involved if amnesic patients with hippocampal damage had impaired ability to imagine future events. Hassabis et  al. (2007) found amnesics’ imaginary experiences consisted of isolated fragments lacking the richness and spatial coherence of healthy controls’ experiences. The amnesic patient KC with extensive brain damage (including to the hippocampus) could not recall a single episodic memory from the past or imagine a possible future event (Schacter & Madore, 2016). Robin (2018) argued that spatial context is of major importance for both episodic memory and imagining future events. For example, Robin et  al. (2016) asked participants to read brief narratives and imagine them in detail. Even when Figure 7.7 no spatial context was specified in the narrative, (A) Areas activated for both episodic simulation and participants nevertheless generated an appropriepisodic memory; (B) areas activated more for episodic ate spatial context while imagining on 78% of simulation than episodic memory. trials. From Benoit and Schacter (2015). Reprinted with permission of Elsevier. The similarities between recall of past personal  events and imagining future personal ­ events have typically been attributed to episodic processes common to both tasks. However, some similarities may also reflect non-episodic processes. For example, amnesics’ impaired past recall and future imagining may reflect an impaired ability to construct detailed narrative. Schacter and Madore (2016) provided convincing evidence that episodic processes are involved in recalling past events and imagining future ones. Participants received training in recollecting details of a recent experience. If recall of past events and imaging of future events both rely on episodic memory, this induction should benefit performance by increasing participants’ production of episodic details in recall and imagination. That is what was found.



313

Long-term memory systems

Evaluation It is assumed episodic memory relies heavily on constructive processes. This assumption is supported by research on eyewitness memory (Chapter 8) and language comprehension (Chapter 10). The additional assumption that constructive processes used in episodic memory retrieval of past events are also involved in imagining future events is an exciting development supported by much relevant evidence. Episodic memory is also involved in divergent creative thinking. What are the main limitations of research in this area? First, several brain areas associated with recalling past personal events and imagining future events have been identified, but their specific contributions remain somewhat unclear. Second, finding a given area is involved in recalling the past and imagining the future does not necessarily mean it is associated with the same cognitive processes in both cases. Third, there is greater uncertainty about future events than past ones. This may explain why imagined future events are less vivid than recalled past events but more abstract and dependent on semantic memory (MacLeod, 2016).

KEY TERM Concepts Mental representations of categories of objects or items.

SEMANTIC MEMORY Our organised general knowledge about the world is stored in semantic memory. Such knowledge is extremely varied (e.g., information about the French language; the rules of hockey; the names of capital cities). Much of this information consists of concepts: mental representations relating to objects, people, facts and words (Lambon Ralph et  al., 2017). These representations are multimodal (i.e., they incorporate information from several sense modalities). How is conceptual information in semantic memory organised? We start this section by addressing this issue. First, we consider the notion that concepts are organised into hierarchies. Second, we discuss an alternative view, according to which semantic memory is organised on the basis of the semantic distance or semantic relatedness between concepts. After that, we focus on the nature of concepts and on how concepts are used. Finally, we consider larger information structures known as schemas.

Organisation: hierarchies of concepts Suppose you are shown a photograph of a chair and asked what it is. You might say it is an item of furniture, a chair or an easy chair. This suggests concepts are organised into hierarchies. Rosch et al. (1976) identified three levels within such hierarchies: superordinate categories (e.g., items of furniture) at the top, basic level categories (e.g., chair) in the middle and subordinate categories (e.g., easy chair) at the bottom. Which level do we use most often? Sometimes we talk about superordinate categories (e.g., “That furniture is expensive”) or subordinate categories (e.g., “I love my iPhone”). However, we typically deal with objects at the intermediate or basic level. Rosch et  al. (1976) asked people to list concept attributes at each level in the hierarchy. Very few attributes were listed for superordinate

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 313

28/02/20 4:21 PM

314 Memory

categories because they are relatively abstract. Many attributes were listed for categories at the other two levels. However, very similar attributes were listed for different categories at the lowest level. Thus, basic level categories generally have the best balance between informativeness and distinctiveness: informativeness is low at the highest level of the hierarchy and distinctiveness is low at the lowest level. In similar fashion, Rigoli et al. (2017) argued (with supporting evidence) that categorising objects at the basic level generally allows us to select the most appropriate action with respect to that object while minimising processing costs. Bauer and Just (2017) found the processing of basic level concepts involved many more brain regions than the processing of subordinate concepts. More specifically, brain areas associated with sensorimotor and language processing were activated with basic level concepts, whereas ­processing was focused on perceptual areas with subordinate concepts. Basic level categories have other special properties. First, they represent the most general level at which individuals use similar motor movements when interacting with category members (e.g., we sit on most chairs in similar ways). Second, basic level categories were used 99% of the time when people named pictures of objects (Rosch et al., 1976). However, we do not always prefer basic level categories. For example, we expect experts to use subordinate categories. We would be surprised if a botanist simply described all the different kinds of plants in a garden as plants! We also often use subordinate categories with atypical category members. For example, people categorise penguins faster as penguins than as birds (Jolicoeur et al., 1984).

Findings Tanaka and Taylor (1991) studied category naming in bird-watchers and dog experts who were shown pictures of birds and dogs. Both groups used subordinate names much more often in their expert domain than their novice domain. Even though people generally prefer basic level categories, this does not necessarily mean they categorise fastest at that level. Prass et al. (2013) presented photographs of objects very briefly and asked participants to categorise them at the superordinate level (animal or vehicle), the basic level (e.g., cat or dog) or the subordinate level (e.g., Siamese cat vs Persian cat). Performance was most accurate and fastest at the superordinate level (see Figure 7.8). In similar fashion, Besson et al. (2017) found categorisation of faces was fastest at the superordinate level. Why does categorisation often occur faster at the superordinate level than the basic level? Close and Pothos (2012) argued that categorisation at the basic level is generally more informative and so requires more detailed processing. Rogers and Patterson (2007) supported this viewpoint. They studied patients with semantic dementia, a condition involving impairment of semantic memory (discussed earlier in this chapter, p. 303; see Glossary).  Patients with severe semantic dementia performed better at the superordinate level than the basic level because less processing was required.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 314

28/02/20 4:21 PM



315

Long-term memory systems

Figure 7.8 Accuracy of (a) object categorisation and (b) speed of categorisation at the superordinate, basic and subordinate levels. From Prass et al. (2013). Reprinted with permission.

Organisation: semantic distance The assumption that concepts in semantic memory are organised hierarchically is too inflexible and exaggerates how neatly information in semantic memory is organised. Collins and Loftus (1975) proposed an approach based on the more flexible assumption that semantic memory is organised in terms of the semantic distance between concepts. Semantic distance between concepts has been measured in many ways (Kenett et  al., 2017). Kenett et al. used data from 60 individuals instructed to produce as many associations as possible in 60 seconds to 800 Hebrew cue words in order to assess semantic distance in terms of path length: “the shortest number of steps connecting any two cue words” (p. 1473). Kenett et  al. (2017) asked participants to judge whether word pairs were semantically related. These judgements were well predicted by path distance: 91% of directly linked words (one-step) were judged to be semantically related, compared to 69% of two-step word pairs and 64% of threestep word pairs. Of importance, Kenett et al. (2017) found semantic distance predicted performance on various episodic memory tasks (e.g., free recall). In an experiment on cued recall, participants were presented with word pairs. This was followed by presenting the first word of each pair and instructing them to recall the associated word. Performance was much higher on directly linked word pairs (1-step) than three-step word pairs: 30% vs 11%, respectively. Semantic distance also predicts aspects of language production. For example, Rose et  al. (2019) had participants name target pictures (e.g., eagle) in the presence of distractor pictures that were semantically close (e.g., owl) or semantically distant (e.g., gorilla). There was an interference effect: naming times were longer when distractors were semantically close. What is the underlying mechanism responsible for the above findings? According to Collins and Loftus’s (1975) influential spreading-activation theory, the appropriate node in semantic memory is activated when we see, hear or think about a concept. Activation then spreads rapidly to other concepts, with greater activation for concepts closely related semantically than those weakly related. Such an account can readily explain Rose et al.’s (2019) findings.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 315

28/02/20 4:21 PM

316 Memory

Spreading-activation theory is also applicable to semantic priming (see Glossary and Chapter 9). For example, dog is recognised as a word faster when the preceding prime is cat than when it is car (Heyman et al., 2018). This can be explained by assuming that presentation of cat activates the dog concept and so facilitates recognising it as a word. In sum, the semantic distance of concepts within semantic memory is important in explaining findings in episodic memory research (e.g., free recall; cued recall) as well as findings relating to language processing. However, this approach is based on the incorrect assumption that each concept has a single fixed representation in semantic memory. Our processing of any given concept is influenced by context (see next section). For example, think about the meaning of piano. You probably did not focus on the fact that pianos are heavy. However, you would do so if you read the sentence “Fred struggled to lift the piano”. Thus, the meaning of any concept (and its relation to other concepts) varies as a function of the circumstances in which it is encountered.

Using concepts: Barsalou’s approach What do the mental representations of concepts look like? The “traditional” view involved the following assumptions about concept representations: ●●

●●

●●

They are abstract and so detached from input (sensory) and output (motor) processes. They are stable: the same concept representation is used on different occasions. Different individuals have similar representations of any given concept.

In sum, it was assumed concept representations “have the flavour of detached encyclopaedia descriptions in a database of categorical knowledge about the world” (Barsalou, 2012, p. 247). This approach forms part of the sandwich model (Barsalou, 2016b): cognition (including concept processing) is “sandwiched” between perception and action and can be studied without considering them. How, then, could we use such concept representations to perceive the visual world or decide how to behave in a given situation (Barsalou, 2016a)? Barsalou (2012) argued all the above theoretical assumptions are incorrect. We process concepts in numerous different settings and that processing is influenced by the current setting or context. More generally, any concept’s representation varies flexibly across situations depending on the individual’s current goals and the precise situation. Consider the concept of a bicycle. A traditional abstract representation would resemble the Chambers Dictionary definition, a “vehicle with two wheels one directly in front of the other, driven by pedals”. According to Barsalou (2009), the individual’s current goals determine which features are activated. For example, the saddle’s height is important if you want to ride a bicycle, whereas information about the tyres is activated if you have a puncture. According to Barsalou’s theoretical approach (e.g., 2012, 2016a,b), conceptual processing is anchored in a given context or situation and

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 316

28/02/20 4:21 PM



Long-term memory systems

317

involves the perceptual and motor or action systems. His approach is described as grounded cognition: cognition (including concept processing) is largely grounded (or based) on the perceptual and motor systems.

Findings Evidence that conceptual processing can involve the perceptual system was reported by Wu and Barsalou (2009). Participants wrote down properties for nouns or noun phrases. Those given the word lawn focused on external properties (e.g., plant; blades) whereas those given rolled-up lawn focused more on internal properties (e.g., dirt; soil). Thus, object qualities not visible if you were actually looking at the object itself are harder to think of than visible ones. We might expect Barsalou’s grounded cognition approach to be less applicable to abstract concepts (e.g., truth; freedom) than concrete ones (objects we can see or hear). However, Barsalou et  al. (2018) argued that abstract concepts are typically processed within a relatively concrete context. In fact, abstract-concept processing sometimes involves perceptual information but much less often than concrete-concept processing (Borghi et al., 2018). Hauk et al. (2004) reported suggestive evidence that the motor system is often involved when we access concept information. When participants read words such as “lick”, “pick” and “kick”, these verbs activated parts of the motor strip overlapping with areas activated when people make the relevant tongue, finger and foot movements. These findings do not show  the motor system is necessary for concept processing – perhaps activation  in areas within the motor strip occurs only after concept ­ activation. Miller et  al. (2018) asked participants to make hand or foot responses  after reading hand-associated words (e.g., knead; wipe) or foot-­ associated words (e.g., kick; sprint). Responses were faster when the word  was compatible with the limb making the response (e.g., hand response to a hand-associated word) than when word and limb were incompatible. These findings apparently support Barsalou’s approach, according to which “The understanding of action verbs requires activation of the motor areas used to carry out the named action” (Miller et al., 2018, p. 335). Miller et  al. (2018) tested the above prediction using event-related potentials (see Glossary) to assess limb-relevant brain activity. However, presentation of hand- and foot-associated words was not followed rapidly by limb-relevant brain activity. Thus, the reaction time findings discussed above were based on processing verb meanings and did not directly involve motor processing. How can we explain the differences in the findings obtained by Hauk et  al. (2004) and by Miller et  al. (2018)? Miller et  al. used a speeded task that allowed insufficient time for motor imagery (and activation of relevant motor areas) to occur, whereas this was not the case with the study by Hauk et al. According to Barsalou, patients with severe motor system damage should have difficulty in processing action-related words (e.g., names of

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 317

28/02/20 4:21 PM

318 Memory

tools). Dreyer et  al. (2015) studied HS, a patient with damage to sensorimotor brain systems close to the hand area. He had specific problems in recognising nouns relating to tools rather than those referring to food or animals. In a review, Vannuscorps et  al. (2016) found some studies reported findings consistent with Dreyer et  al.’s (2015) research. In other studies, however, patients with damage to sensorimotor systems had no deficit in conceptual processing of actions or manipulable objects. Vannuscorps et al. concluded many patients with deficits in processing concepts relating to actions and tool have extensive damage to brain areas additional to sensorimotor areas. The findings from such patients have limited relevance to Barsalou’s (2016b) theory. Vannuscorps et  al. (2016) studied a patient, JR, with brain damage primarily affecting the action production system. JR’s picture-naming ability was assessed repeatedly over a 3-year period. Even though JR’s disease was progressive, his naming performance with action-related concepts (e.g., hammer; shovel) remained intact. Thus, processing of action-­ related concepts does not necessarily require the involvement of the motor system.

Evaluation Barsalou’s general theoretical approach has several strengths. First, our everyday use of concept knowledge often involves the perceptual and motor systems. Second, concept processing is generally flexible: it is influenced by the present context and the individual’s goals. Third, it is easier to see how concept representations facilitate perception and action within Barsalou’s approach than the “traditional” approach. What are the limitations of Barsalou’s approach? First, Barsalou argues it is generally necessary to use perceptual and/or motor processes to understand concept meanings fully. However, motor processes may often not be necessary (Miller et al., 2018; Vannuscorps et al., 2016). Second, Barsalou exaggerates variations in concept processing across time and contexts. The traditional view that concepts possess a stable, abstract core has not been disproved (Borghesani & Piazza, 2017). In fact, concepts have a stable core and concept processing is often context-­ dependent (discussed below). Third, much concept knowledge does not consist simply of perceptual and motor features. Borghesani and Piazza (2017, p. 8) provide the following example: “Tomatoes are native to South and Central America.” Fourth, we recognise the similarities between concepts not sharing perceptual or motor features. For example, we categorise watermelon and blackberry as fruit even though they are very different visually and we eat them using different motor actions.

Using concepts: hub-and-spoke model We have seen concept processing often involves the perceptual and motor systems. However, it is improbable nothing else is involved. First, we would not have coherent concepts if concept processing varied considerably across

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 318

28/02/20 4:21 PM



Long-term memory systems

319

Figure 7.9 The hub-and-spoke model. (a) the hub within the anterior temporal lobe (ATL) has bidirectional connections to the spokes (praxis refers to object manipulability; it is action-related); (b) the locations of the hub and spokes are shown, same colour coding as in (a). From Lambon Ralph et al. (2017).

situations. Second, as mentioned above, we can detect similarities in concepts differing greatly in perceptual terms. Such considerations led Patterson et  al. (2007) to propose their huband-spoke model (see Figure 7.9). The “spokes” consist of several modality-­ specific regions involving sensory and motor processing. Each concept also has a “hub” – a modality-independent unified representation efficiently integrating our conceptual knowledge. It is assumed hubs are located within the anterior temporal lobes. As discussed earlier, patients with semantic dementia invariably have damage to the anterior temporal lobes and extensive loss of conceptual knowledge is their main problem. In the original model, it was assumed the two anterior temporal lobes (left and right hemisphere) formed a unified system. This is approximately correct – there is substantial activation in both anterior temporal lobes whether concepts are presented visually or verbally. However, the left anterior temporal lobe was more involved than the right in processing verbal information whereas the opposite was the case in processing visual information (Rice et  al., 2015). Lambon Ralph et  al. (2017) discussed research where patients with damage to the left anterior temporal lobe had particular problems with anomia (object naming). In contrast, patients with damage to the right anterior temporal lobe had particular problems in face recognition.

Findings We start with research on the “hub”. Mayberry et al. (2011) argued semantic dementia involves a progressive loss of “hub” information producing

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 319

28/02/20 4:21 PM

320 Memory

KEY TERM Category-specific deficits Disorders caused by brain damage in which semantic memory is disrupted for certain semantic categories.

a blurring of the boundary between category members and non-members. Accordingly, they predicted semantic dementia patients would have particular problems making accurate category-membership decisions with (1) atypical category members (e.g., emu is an atypical bird); and (2) pseudotypical items: non-category members resembling category members (e.g., butterfly is like a bird). Both predictions were supported with pictures and words, suggesting processing within the anterior temporal lobes is general and “hub-like” rather than modality-specific (e.g., confined to the visual modality). Findings from patients with semantic dementia suggest the anterior temporal lobes are the main brain areas associated with “hubs”. Binder et al. (2009) reviewed 120 neuroimaging studies involving semantic memory in healthy individuals and found the anterior temporal lobes were consistently activated. Pobric et  al. (2010a) applied transcranial magnetic stimulation (TMS; see Glossary) to interfere with processing in the left or right anterior temporal lobe while participants processed concepts presented by verbal or pictorial stimuli. TMS disrupted concept processing comparably in both anterior temporal lobes. However, Murphy et  al. (2017) discovered important differences between ventral (bottom) and anterior (front) regions of the anterior temporal lobe. Ventral regions responded to meaning and acted as a hub. However, anterior regions were responsive to differences in input modality (visual vs auditory) and thus are not “hub-like”. We turn now to research on the “spokes”. Pobric et al. (2010b) applied transcranial magnetic stimulation (TMS) to interfere briefly with processing within the inferior parietal lobule (involved in processing actions we can make towards objects; the praxis spoke in Figure 7.9). TMS slowed naming times for manipulable objects but not non-manipulable ones indicating this brain area (unlike the anterior temporal lobes) is involved in relatively specific processing. Findings consistent with those of Pobric et  al. (2010b) were reported by Ishibashi et  al. (2018). They applied transcranial direct current stimulation (tDCS; see Glossary) to the inferior parietal lobule and the anterior temporal lobe. Since they used anodal tDCS, it was expected this stimulation would enhance performance on tasks requiring rapid access to semantic information concerning tool function (e.g., scissors are used for cutting) or tool manipulation (e.g., pliers are gripped by the handles). As predicted, anodal tDCS applied to the anterior temporal lobe facilitated performance on both tasks because this brain area contains much general object knowledge (see Figure 7.10). The effects of anodal tDCS applied to the inferior parietal lobule were limited to the manipulation task as predicted because this area processes action-related information. Suppose we studied patients whose brain damage primarily affected one or more of the “spokes”. According to the model, we should find c­ ategory-specific deficits (problems with specific categories of objects). There is convincing evidence for the existence of various category-specific deficits and these deficits are mostly associated with the model’s spokes (Chen et al., 2017). However, it is often hard to interpret the findings from patients with category-specific deficits. For example, many patients find it much harder

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 320

28/02/20 4:21 PM



Long-term memory systems

321

0.7

Accuracy

0.65

Sham ATL-A

0.6

IPL-A

0.55 0.5

Manipulation

Function Task

Figure 7.10 Performance accuracy on tool function and tool manipulation tasks with anodal transcranial direct current stimulation to the anterior temporal lobe (ATL-A) or to the inferior parietal lobule (IPL-A) and in a control condition (Sham). From Ishibashi et al. (2018).

to identify pictures of living than non-living things. Several factors are involved: living things have greater contour overlap than non-living things, they are more complex structurally and they activate less motor information (Marques et al., 2013). It is difficult to disentangle the relative importance of these factors. Finally, we consider a study by Borghesani et  al. (2019). Participants read words (e.g., elephant) having conceptual features (e.g., mammal) and perceptual features (e.g., big; trumpeting). There were two main findings. First, conceptual and perceptual features were processed in different brain areas. Second, initial processing of both types of features occurred approximately 200 ms after word onset. These findings support the model’s assumptions that there is somewhat independent processing of “hub” information (i.e., conceptual features) and “spoke” information (i.e., perceptual features). However, the findings are inconsistent with Barsalou’s approach, according to which perceptual processing should precede (and influence) conceptual processing.

Evaluation The hub-and-spoke model provides a comprehensive approach combining aspects of the traditional view of concept processing and Barsalou’s approach. The notion within the model that concepts are represented by abstract core information and modality-specific information has strong support. Brain areas associated with different aspects of concept processing have been identified. What are the model’s limitations? First, it emphasises mostly the storage and processing of single concepts. However, we also need to consider relations between concepts. For example, we can distinguish between taxonomic relations based on similarity (e.g., dog–bear) and thematic relations based on proximity (e.g., dog–leash). The anterior temporal lobes are important for taxonomic semantic processing whereas the temporo-parietal

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 321

28/02/20 4:21 PM

322 Memory

KEY TERMS Schema An organised packet of information about the world, events or people stored in long-term memory. Script A form of schema containing information about a sequence of events (e.g., events during a typical restaurant meal).

cortex is important for thematic semantic processing (Mirman et al., 2017). The model has problems with the latter finding given its focus on the anterior temporal lobes. Second, the role of the anterior temporal lobes in semantic memory is more complex than assumed theoretically. For example, Mesulam et al. (2013) found semantic dementia patients with damage primarily to the left anterior temporal lobe had much greater problems with verbal concepts than visually triggered object concepts. Thus, regions of the left anterior temporal lobe form part of a language network rather than a very general modality-independent hub. Third, we have only a limited understanding of the division of labour between the hub and the spokes during concept processing (Lambon Ralph, 2014). For example, we do not know how the relative importance of hub-and-spoke processing depends on task demands. It is also unclear how  information from hubs and spokes is integrated during concept processing.

Schemas vs concepts We may have implied semantic memory consists exclusively of concepts. In fact, there are also larger information structures called schemas. Schemas are “superordinate knowledge structures that reflect abstracted commonalities across multiple experiences” (Gilboa & Marlatte, 2017, p. 618). Scripts are schemas containing information about sequences of events. For example, your restaurant script probably includes the following: being given a menu, ordering food and drink, eating and drinking and paying the bill (Bower et al., 1979). Scripts (and schemas more generally) are discussed in Chapter 10 (in relation to language comprehension and memory) and Chapter 8 (relating to failures of eyewitness memory). Here we first consider brain areas associated with schema-related information. We then explore implications of the theoretical assumption that semantic memory contains abstract concepts corresponding to words and broader organisational structures based on schemas. On that assumption, we might expect some brain-damaged patients would have greater problems accessing concept-based information than schema-based information, whereas others would exhibit the opposite pattern. This is a double dissociation (see Glossary).

Brain networks Schema information and processing involves several brain areas. However, the ventromedial prefrontal cortex (vmPFC) is especially important. It includes several Brodmann Areas including BA10, BA11, BA12, BA14 and BA25 (see Figure 1.5). Gilboa and Marlatte (2017) reviewed 12 fMRI experiments where participants engaged in schema processing. Much of the ventromedial prefrontal cortex was consistently activated, plus other areas including the hippocampus. Research on brain-damaged patients also indicates the important role of the ventromedial prefrontal cortex in schema processing. Ghosh et  al. (2014) gave participants a schema (“going to bed at night”) and asked

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 322

28/02/20 4:21 PM



Long-term memory systems

323

them to decide rapidly whether each of a series of words was closely related to it. Patients with damage to the ventromedial prefrontal cortex performed worse than healthy controls on this task, indicating impaired schema-­related processing. Warren et  al. (2014) presented participants with words belonging to a single schema (e.g., winter; blizzard; cold) followed by recall. Healthy individuals often falsely recall a schema-relevant non-presented word (e.g., snow) because their processing and recall involve extensive schema processing. If patients with damage to the ventromedial prefrontal cortex engage in minimal schema processing, they should show reduced false recall. That is what Warren et al. found.

Double dissociation As discussed earlier, brain-damaged patients with early-stage semantic dementia (see Glossary) have severe problems accessing word and object meanings. Bier et  al. (2013) assessed the ability of three semantic dementia patients to use schema-relevant information by asking them what they would do if they had unknowingly invited two guests to lunch. The required script actions included dressing to go outdoors, going to the grocery store, shopping for food, preparing the meal and clearing up afterwards. One patient successfully described all the above script actions accurately despite severe problems with accessing concept information from semantic memory. The other patients had particular problems with planning and preparing the meal. However, they remembered script actions relating to dressing and shopping. Note we might expect semantic dementia patients to experience problems with using script knowledge because they would need access to relevant concept knowledge (e.g., knowledge about food ingredients) when using script knowledge (e.g., preparing a meal). Other patients have greater problems with accessing script information than concept meanings. Scripts typically have a goal-directed quality (e.g., using a script to achieve the goal of enjoying a restaurant meal). Since the prefrontal cortex is of major importance in goal-directed activity, we might expect patients with prefrontal damage (e.g., ventromedial prefrontal cortex) to have particular problems with script memory. Cosentino et  al. (2006) studied patients having semantic dementia or fronto-temporal dementia (involving extensive damage to the prefrontal cortex and the temporal lobes) with scripts containing sequencing or script errors (e.g., dropping fish in a bucket before casting the fishing line). Patients with extensive prefrontal damage failed to detect far more sequencing or script errors than those with semantic dementia. Farag et  al. (2010) confirmed that patients with fronto-temporal dementia are generally less sensitive than those with semantic dementia to the appropriate order of script events. They identified the areas of brain damage in their participants (see Figure 7.11). Patients (including fronto-temporal ones) insensitive to script sequencing had damage in inferior and dorsolateral prefrontal cortex. In contrast, patients (including those with semantic dementia) sensitive to script sequencing showed little evidence of prefrontal damage.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 323

28/02/20 4:21 PM

324 Memory

Figure 7.11 (a) Brain areas damaged in patients with fronto-temporal degeneration or progressive non-fluent aphasia. (b) Brain areas damaged in patients with semantic dementia or mild Alzheimer’s disease. From Farag et al. (2010). By permission of Oxford University Press.

Zahn et al. (2017) also studied patients with fronto-temporal dementia with damage to the fronto-polar cortex (BA10, part of the ventromedial prefrontal cortex) and the anterior temporal lobe. They assessed patients’ knowledge of social concepts (e.g., adventurous) and script knowledge (e.g., the likely long-term consequences of ignoring their employer’s requests).  Patients with greater damage to the fronto-polar cortex than the anterior temporal lobe showed relatively poorer script knowledge than knowledge of social concepts. In contrast, patients with the opposite pattern of brain damage had relatively poorer knowledge of social concepts. In sum, semantic memory for concepts centres on the anterior temporal lobe. Patients with semantic dementia have damage to this area causing severely impaired concept memory. In contrast, semantic memory for scripts or schemas involves the prefrontal cortex (especially ventromedial prefrontal cortex). However, when we use our script knowledge (e.g., preparing a

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 324

28/02/20 4:21 PM



325

Long-term memory systems

meal), it is important to access relevant concept k ­ nowledge  (e.g.,  knowledge about food ingredients). As a consequence, semantic dementia patients whose primary impairment is to concept knowledge also have great difficulties in accessing and using script knowledge.

NON-DECLARATIVE MEMORY Non-declarative memory does not involve conscious recollection but instead reveals itself through behaviour. As mentioned earlier, priming (the facilitated processing of repeated stimuli) and procedural memory (mainly skill learning) are two major forms of non-declarative memory. Note that procedural memory is typically involved in implicit learning (discussed in Chapter 6). There are two major differences between priming (also known as repetition priming) and procedural memory:

KEY TERMS Perceptual priming A form of priming in which repeated presentations of a stimulus facilitates its perceptual processing. Conceptual priming A form of priming in which there is facilitated processing of stimulus meaning.

(1) Priming often occurs rapidly whereas procedural memory or skill learning is typically slow and gradual (Knowlton & Foerde, 2008). (2) Priming is tied fairly closely to specific stimuli whereas skill learning typically generalises to numerous stimuli. For example, it would be useless if you could hit a good backhand at tennis only when the ball approached you from a given direction at a given speed! The strongest evidence for distinguishing between declarative and non-­ declarative memory comes from amnesic patients. Such patients mostly have severely impaired declarative memory but almost intact non-­ declarative memory (but see next section for a more complex account). Oudman et al. (2015) reviewed research on priming and procedural memory or skill learning in amnesic patients with Korsakoff’s syndrome (see Glossary). Their performance was nearly intact on tasks such as the pursuit rotor (a stylus must be kept in contact with a target on a rotating turntable) and the serial reaction time task (see Glossary). Amnesic patients performed poorly on some non-declarative tasks reviewed by Oudman et  al. (2015) for various reasons. First, some tasks require declarative as well as non-declarative memory. Second, some ­Kors­akoff’s patients have widespread brain damage (including areas involved in non-declarative memory). Third, the distinction between declarative and non-declarative memory is less clear-cut and important than ­traditionally assumed (see later discussion).

Repetition priming We can distinguish between perceptual and conceptual priming. Perceptual priming occurs when repeated presentation of a stimulus leads to facilitated processing of its perceptual features. For example, it is easier to identify a degraded stimulus if it was presented shortly beforehand.  Conceptual  priming occurs when repeated presentation of a stimulus  leads to facilitated processing of its meaning. For example, we can  decide faster whether an object is living or non-living if we saw it recently.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 325

28/02/20 4:21 PM

326 Memory

There are important differences between perceptual priming and conceptual priming. Gong et  al. (2016) found patients with frontal lobe damage performed poorly on conceptual priming but had intact perceptual priming. In contrast, patients with occipital lobe damage (an area associated with visual processing) had intact conceptual priming but impaired perceptual priming. If repetition priming involves non-declarative memory, amnesic patients should show intact repetition priming. This prediction has much support. For example, Cermak et  al. (1985) found amnesic patients had comparable perceptual priming to controls. However, patients sometimes exhibit a modest priming impairment. Levy et al. (2004) studied conceptual priming: deciding whether words previously studied (vs not studied) belonged to given categories. Two male amnesic patients (EP and GP) with large lesions in the medial temporal lobes had intact conceptual priming to healthy controls, but they performed much worse than controls on recognition memory (involving declarative memory). Much additional research was carried out on EP, who had extensive damage to the perirhinal cortex (BA35 and BA36) plus other regions within the medial temporal lobe (Insausti et  al., 2013). His long-term declarative memory was massively impaired. For example, he had very poor ability to identify names, words and faces that became familiar only after amnesia onset. However, EP’s performance was intact on non-declarative tasks (e.g., perceptual priming; visuo-motor skill learning; see Figure 7.12). His performance was at chance level on recognition memory but as good as that of healthy controls on perceptual priming. Schacter and Church (1995) reported further evidence amnesic patients have intact perceptual priming. Participants initially heard words all spoken in the same voice and then identified the same words passed through an auditory filter. There was priming because identification performance was better when the words were spoken in the same voice as initially. The notion that priming depends on memory systems different from those involved  in declarative memory would be strengthened if we found patients having intact declarative memory but impaired priming. This would provide a double dissociation when considered together with amnesics having intact priming but impaired declarative memory. Gabrieli et  al. (1995) studied a  patient, MS with damage to the right occipital lobe. MS had intact performance on recognition and cued recall (declarFigure 7.12 ative memory) but impaired performance on Percentages of priming effect (left-hand side) and recognitionperceptual priming. This latter finding is conmemory performance of healthy controls (CON) and sistent with findings reported by Gong et  al. patients (EP). (2016) in patients with  occipital lobe damage From Insausti et al. (2013). © National Academy of Sciences. Reproduced with permission. (discussed earlier).

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 326

28/02/20 4:21 PM



327

Long-term memory systems

The above picture is too neat-and-tidy. Like Schacter and Church (1995), Schacter et  al. (1995) studied perceptual priming based on auditory word identification. However, the words were initially presented in six different voices. On the word-identification test, half were presented in the same voice as initially and the other half were spoken by one of the other voices (re-paired condition). Healthy controls (but not amnesic patients) had more priming for words presented in the same voice. How can we explain these findings? In both conditions, participants were exposed to words and voices previously heard. The only advantage in the same voice condition was that the pairing of word and voice was the same as before. However, only those participants who had linked or associated words and voices at the original presentation would have benefited from the repeated pairings. Thus, amnesics are poor at binding together different kinds of information even on priming tasks apparently involving non-declarative memory (see later discussion pp. 333–336). Related findings were obtained by Race et al. (2019). Amnesic patients had intact repetition priming when the task involved relatively simple associative learning. However, their repetition priming was impaired when the task involved more complex and abstract associative learning. Race et  al. concluded “These results highlight the multiple, distinct cognitive and neural mechanisms that support repletion priming” (p. 102).

KEY TERMS Repetition suppression The finding that stimulus repetition often leads to reduced brain activity (typically with enhanced performance via priming). Repetition enhancement The finding that stimulus repetition sometimes leads to increased brain activity.

Priming processes What processes are involved in priming? A popular view is based on perceptual fluency: repeated presentation of a stimulus means it can be processed more efficiently using fewer resources. This view is supported by the frequent finding that brain activity decreases with stimulus repetition: this is repetition suppression. However, this finding on its own does not demonstrate a causal link between repetition suppression and priming. Wig et al. (2005) reported more direct evidence using transcranial magnetic stimulation to disrupt processing. TMS abolished repetition suppression and conceptual priming suggesting that repetition suppression was necessary for conceptual priming. Stimulus repetition is sometimes associated with repetition enhancement involving increased brain activity with stimulus repetition. de Gardelle et al. (2013) presented repeated faces and found evidence of both stimulus suppression and stimulus enhancement. What determines whether there is repetition suppression or enhancement? Ferrari et  al. (2017b) presented participants with repeated neutral and emotional scenes. Repetition suppression was found when scenes were repeated many times in rapid succession, probably reflecting increased perceptual fluency. In contrast, repetition enhancement was found when repetitions were spaced out in time. This was probably due to spontaneous retrieval of previously presented stimuli. Kim (2017a) reported a meta-analysis of studies on repetition suppression and enhancement in repetition priming (see Figure 7.13). There were two main findings. First, repetition suppression was associated with reduced activation in the ventromedial prefrontal cortex and related areas, suggesting it reflected reduced encoding of repeated stimuli.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 327

28/02/20 4:21 PM

328 Memory

Figure 7.13 Brain regions showing repetition suppression (RS; orange colour) or response enhancement (RE; blue colour) in a meta-analysis. From Kim (2017a).

Second, repetition enhancement was associated with increased activation in dorsolateral prefrontal cortex and related areas. According to Kim (2017a, p. 1894), “The mechanism for repetition enhancement is . . . explicit retrieval during an implicit memory task.” Thus, explicit or declarative memory is sometimes involved in allegedly non-declarative priming tasks. In sum, progress has been made in understanding the processes underlying priming. Of importance is suggestive evidence that priming sometimes involves declarative as well as non-declarative memory (Kim, 2017). The mechanisms involved in repetition suppression and priming are still not fully understood. However, these effects depend on complex interactions among the time interval between successive stimuli, the task and the allocation of attention (Kovacs & Schweinberger, 2016).

Procedural memory or skill learning Motor skills are important in everyday life – examples include word processing, writing, playing netball and playing a musical instrument. Skill learning or procedural memory includes sequence learning, mirror tracing (tracing a figure seen in a mirror), perceptual skill learning, mirror reading (reading a text seen in a mirror) and artificial grammar learning (Foerde & Poldrack, 2009; see Chapter 6). However, although these tasks are all categorised as skill learning, they differ in terms of the precise cognitive ­processes involved. Here we consider whether the above tasks involve non-declarative or procedural memory and thus involve different memory systems from those underlying episodic and semantic memory. We will consider skill learning in amnesic patients. If they have essentially intact skill learning but severely impaired declarative memory, that would provide evidence that different memory systems are involved. Before considering the relevant evidence, we address an important general issue. It is sometimes incorrectly assumed any given task is always

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 328

28/02/20 4:21 PM



Long-term memory systems

329

performed using non-declarative or declarative memory. Consider the weather-prediction task where participants use various cues to predict whether the weather will be sunny or rainy. Reber et  al. (1996) found amnesics learned this task as rapidly as healthy controls, suggesting it involves procedural (non-declarative) memory. However, Rustemeier et al. (2013) found 61% of participants used a non-declarative strategy throughout learning but 12% used a declarative strategy throughout. In addition, 27% shifted from an early declarative to a later declarative strategy.

Findings Amnesics often have essentially intact skill learning on numerous skill-learning tasks. For example, using the pursuit rotor (manual tracking of a moving target), Tranel et al. (1994) found that 28 amnesic patients had intact learning. Even a patient (Boswell) with unusually extensive brain damage to brain areas strongly associated with declarative memory had intact learning. Much research has used the serial reaction time task (see Glossary). As discussed in Chapter 6, amnesics’ performance on this task is typically reasonably intact. It is somewhat hard to interpret the findings because performance on this task by healthy controls often involves some consciously accessible knowledge (Gaillard et al., 2009). Spiers et al. (2001) considered the non-declarative memory performance of 147 amnesic patients. All showed intact performance on tasks involving priming and learning skills or habits. However, as mentioned earlier, some studies have shown modest impairment in amnesic patients (Oudman et al., 2015). In addition, amnesics’ procedural memory has important limitations: “[Amnesic patients] typically do not remember how or where information was obtained, nor can they flexibly use the acquired information. The knowledge therefore lacks a . . . context” (Clark & Maguire, 2016, p. 68). Most tasks assessing skill learning in amnesics require learning far removed from everyday life. However, Cavaco et  al. (2004) used five skill-learning tasks (e.g., a weaving task) involving real-world skills. Amnesic patients showed comparable learning to healthy controls despite significantly impaired declarative memory for the same tasks. Anderson et al. (2007) studied the motor skill of car driving in two severely amnesic patients. Their steering, speed control, safety errors and driving with ­distraction were intact. Finally, we discuss patients with Parkinson’s disease (see Glossary). These patients have damage to the striatum (see Glossary), which is of greater importance to non-declarative learning than declarative learning. As predicted, Parkinson’s patients typically have severely impaired non-­ declarative learning and memory (see Chapter 6). For example, Kemeny et al. (2018) found on the serial reaction time task that Parkinson’s patients showed practically no evidence of learning (see Figure 7.14). However, Parkinson’s patients sometimes have relatively intact episodic memory. For example, Pirogovsky-Turk et  al. (2015) found normal performance by Parkinson’s patients on measures of free recall, cued recall and recognition memory. These findings strengthen the case for a ­distinction between declarative and non-declarative memory.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 329

28/02/20 4:21 PM

330 Memory Figure 7.14 Mean reaction times on the serial reaction time task by Parkinson’s disease patients (PD) and healthy controls (HC).

1,150 1,100 1,050 Mean RTs (ms)

From Kemeny et al. (2018).

1,200

1,000 950 900 850 800 750 2R

1

k1

k1

oc Bl

0

oc

k1

Bl

k9

oc Bl

oc

k8

HC

Bl

oc

k7

Bl

oc

k6

PD

Bl

oc

k5

Bl

k4

oc Bl

oc

k3

Bl

oc

k2

Bl

oc Bl

Bl

oc

k1

700

Other research complicates the picture. First, Parkinson’s patients (especially as the disease progresses) often have damage to brain areas associated with episodic memory. Das et  al. (2019) found impairments in recognition memory (a form of episodic memory) among Parkinson’s patients were related to damage within the hippocampus (of central importance in episodic memory). Many Parkinson’s patients also have problems with attention and executive functions (Roussel et  al., 2017). Bezdicek et  al. (2019) found impaired episodic memory in Parkinson’s patients was related to reduced functioning of brain areas associated with attention and executive functions as well as reduced hippocampal functioning. Second, there are individual differences in the strategies used on many tasks (e.g., weather-prediction task discussed earlier). Kemeny et al. (2018) found Parkinson’s patients and healthy controls had comparable performance on the weather-prediction task. However, most Parkinson’s patients used a much simpler strategy than healthy controls. Thus, the patients’ processing was affected by the disease although this was not apparent from their overall performance.

Interacting systems A central theme of this chapter is that traditional theoretical views are oversimplified (see next section pp. 332–340). For example, skill learning often involves brain circuitry including the hippocampus (traditionally associated exclusively with episodic memory). Döhring et  al. (2017) studied patients with transient global amnesia who had dysfunction of the hippocampus lasting for several hours. This caused profound deficits in declarative memory but also reduced learning on a motor learning task involving finger sequence tapping. Thus, optimal motor learning can require interactions of the procedural and declarative memory systems. Albouy et  al. (2013) discussed research on motor sequence learning (skill learning). The hippocampus (centrally involved in the formation of

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 330

28/02/20 4:21 PM



Long-term memory systems

331

declarative memories) played a major role in the acquisition and storage of procedural memories and there were numerous interactions between hippocampal-cortical and striato-cortical systems. Doyon et  al. (2018) reviewed changes during motor sequence learning. Early learning mainly involved striatal regions in conjunction with prefrontal and premotor cortical regions. The contribution of the striatum and motor cortical regions increases progressively during later learning. These findings suggest procedural learning is dominant later in learning but that declarative memory plays a part early in learning. Similar findings are discussed by Beukema and Verstynen (2018) (see p. 276).

How different are priming and skill learning? Priming and skill learning are both forms of non-declarative memory. However, as Squire and Dede (2015, p. 2) pointed out, “Non-declarative memory is an umbrella term referring to multiple forms of memory.” Thus, we might expect to find differences between priming and skill learning. As mentioned earlier, priming generally occurs more rapidly and the learning associated with priming is typically less flexible. If priming and skill learning involve different processes, we would not necessarily expect individuals good at skill learning to also be good at priming. Schwartz and Hashtroudi (1991) found no correlation between performance on a priming task (word identification) and a skill-learning task (inverted text reading). Findings based on neuroimaging or on brain-damaged patients might clarify the relationship between priming and skill learning. Squire and Dede (2015) argued the striatum is especially important in skill learning whereas the neocortex (including the prefrontal cortex) is of major importance in priming. Some evidence (including research discussed above) is supportive of Squire and Dede’s (2015) viewpoint. However, other research is less supportive. Osman et  al. (2008) found Parkinson’s patients had intact procedural learning when learning about and controlling a complex system (e.g., water-tank system). This suggests the striatum is not needed for all forms of skill learning. Gong et al. (2016; discussed earlier, p. 326) found patients with frontal damage nevertheless had intact perceptual priming. The wide range of tasks used to assess priming and skill learning means numerous brain regions are sometimes activated on both kinds of tasks. We start with skill learning. Penhune and Steele (2012; see Chapter 6) proposed a theory assuming skill learning involves several brain areas including the primary motor cortex, cerebellum and striatum. So far as priming is concerned, Segaert et  al. (2013) reviewed 29 neuroimaging studies and concluded that “Repetition enhancement effects have been found all over the brain” (p. 60).

Evaluation Much evidence suggests priming and skill learning are forms of non-­ declarative memory involving different processes and brain areas from those involved in declarative memory. There is limited evidence of a double

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 331

28/02/20 4:21 PM

332 Memory

dissociation: amnesic patients often exhibit reasonably intact priming and skill learning but severely impaired declarative memory. In contrast, Parkinson’s patients (especially in the early stages of the disease) sometimes have intact declarative memory but impaired procedural memory. What are the main limitations of research in this area? (1) There is considerable flexibility in the processes used on many memory tasks. As a result, it is often an oversimplification to describe a task as involving only “non-declarative memory”. (2) Numerous tasks have been used to assess priming and skill learning. More attention needs to be paid to differences among tasks in the precise cognitive processes involved. (3) There should be more emphasis on brain networks rather than specific brain areas. For example, motor sequence learning involves a striato-cortical system rather than simply the striatum. In addition, this system interacts with a hippocampal-cortical system (Albouy et  al., 2013). (4) The findings from Parkinson’s patients are mixed and inconsistent. Why is this? As the disease progresses, brain damage in such patients typically moves beyond brain areas involved in non-declarative memory (e.g., the striatum) to areas involved in declarative memory (e.g., the hippocampus and prefrontal areas).

BEYOND MEMORY SYSTEMS AND DECLARATIVE VS NON-DECLARATIVE MEMORY Until relatively recently, most memory researchers argued the distinction between declarative/explicit and non-declarative/implicit memory was of major theoretical importance. According to this traditional approach, a crucial difference between memory systems is whether they support ­conscious access to stored information (see Figure 7.2). It was also often assumed that only memory systems involving conscious access depend heavily on the medial temporal lobe (especially the hippocampus). The traditional approach has proved extremely successful – consider all the accurate predictions it made with respect to the research discussed earlier. However, its major assumptions are oversimplified and more complex t­ heories are required.

Explicit vs implicit memory If the major dividing line in long-term memory is between declarative (explicit) and non-declarative (implicit) memory, it is important to devise tasks involving only one type of memory. This sounds easy: declarative memory is involved when participants are instructed to remember ­previously presented information but not otherwise. Reality is more complex. Consider the word-completion task. Participants are presented with a word list. Subsequently, they perform an apparently unrelated task: word fragments (e.g., STR _____ ) are presented and they produce a word starting with those letters. Implicit memory is revealed by the extent to which their word completions match list words. Since the instructions make no reference to recall, this task is apparently

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 332

28/02/20 4:21 PM



Long-term memory systems

an implicit/non-declarative task. However, participants who become aware of the connection between the word list and the word-completion task perform better than those who do not (Mace, 2003). Hippocampal activation is generally associated with declarative memory whereas activity of the striatum is associated with non-­declarative memory. However, Sadeh et  al. (2011) obtained more complex findings. Effective learning on an episodic memory task was associated with ­interactive activity between the hippocampus and striatum. Following a familiar route also often involves complex interactions between the hippocampus and striatum with declarative memory assisting in the guidance of ongoing actions retrieved from non-declarative memory (Goodroe et al., 2018). The involvement of declarative/explicit memory and non-declarative/ implicit memory on any given task sometimes changes during the course of learning and/or there are individual differences in use of the two forms of memory. Consider the acquisition of sequential motor skills. There is  often a shift from an early reliance on explicit processes to a later reliance on  implicit processes (Beukema & Verstynen, 2018; see Chapter 6). Lawson et al. (2017) reported individual differences during learning on the serial reaction time task (see Chapter 6). Some learners appeared to rely solely on implicit processes whereas others also used explicit processes.

333

Research activity: Word-stem completion task

Henke’s processing-based theoretical account Several theories differing substantially from the traditional theoretical approach have been proposed. For example, compare Henke’s (2010) processing-based model (see Figure 7.15) against the traditional model (see Figure 7.2). Henke’s model differs crucially in that “Consciousness of encoding and retrieval does not select for memory systems and hence does not feature in this model” (p. 528). Another striking difference relates to declarative memory. In the traditional model, all declarative memory (episodic plus semantic memory) depends on the medial temporal lobes (especially the hippocampus) and the diencephalon. In Henke’s model, in contrast, episodic memory depends on the hippocampus and neocortex, semantic memory can involve brain areas outside the hippocampus, and familiarity in recognition memory depends on the parahippocampal gyrus and neocortex (and also the perirhinal cortex). Figure 7.15 is oversimplified. Henke (2010) argued semantic knowledge can be learned in two different ways: one way is indicated in the figure but the other way “uses the hippocampus and involves episodic memory formation” (p. 528). The assumption that semantic memory need not depend on the hippocampus helps to explain why amnesic patients’ semantic memory is generally less impaired than their episodic memory (Spiers et al., 2001). There are three basic processing modes in Henke’s (2010) model: (1) Rapid encoding of flexible associations: this involves episodic memory and depends on the hippocampus. It is also assumed semantic memory often involves the hippocampus.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 333

28/02/20 4:21 PM

334 Memory

Figure 7.15 A processing-based memory model. There are three basic processing modes: (1) rapid encoding of flexible associations; (2) slow encoding of rigid associations; and (3) rapid encoding of single or unitised items formed into a single unit. The brain areas associated with each of these processing modes are indicated towards the bottom of the figure. From Henke (2010). Reproduced with permission from Nature Publishing Group.

(2) Slow encoding of rigid associations: this involves procedural memory, semantic memory and classical conditioning, and depends on the basal ganglia (e.g., the striatum) and cerebellum. (3) Rapid encoding of single or unitised items (formed into a single unit): this involves priming and familiarity in recognition memory and depends on the parahippocampal gyrus. Many predictions are common to Henke’s (2010) model and the traditional model. For example, amnesic patients with hippocampal damage should have generally poor episodic memory but intact procedural memory and priming. However, the two models make different predictions: (1) Henke’s (2010) model predicts that amnesic patients with hippocampal damage should have severe impairments of episodic memory (and semantic memory) for flexible relational associations but not for single or unitised items. In contrast, according to the traditional model, amnesic patients should have impaired episodic and semantic memory for single or unitised items as well as for flexible relational associations. (2) Henke’s (2010) model predicts the hippocampus is involved in the encoding of flexible associations with unconscious and conscious

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 334

28/02/20 4:21 PM



Long-term memory systems

learning. In contrast, the traditional model assumes the hippocampus is involved only in conscious learning. (3) Henke’s model predicts the hippocampus is not directly involved in familiarity judgements in recognition memory. In contrast, the traditional model assumes all forms of episodic memory depend on the hippocampus.

Findings We start with the first prediction above as it applies to episodic memory. Quamme et  al. (2007) studied recognition memory for word pairs (e.g., CLOUD–LAWN). In the key condition, each word pair was unitised (e.g., CLOUD-LAWN was interpreted as a lawn used for viewing clouds). Amnesic patients with hippocampal damage had a much smaller ­recognition-memory deficit when the word pairs were unitised than when they were not. Olson et  al. (2015) presented faces with a fixed or variable viewpoint followed by a recognition-memory test. It was assumed flexible associations would be formed only in the variable-viewpoint condition. As predicted, a female amnesic (HC) had intact performance only in the fixed-viewpoint condition (see Figure 7.16). Research by Blumenthal et  al. (2017; discussed earlier, p. 302) on semantic memory is also relevant to the first prediction. An amnesic patient with hippocampal damage had impaired semantic memory performance when it depended on having formed relational associations. However, her semantic memory performance was intact when relational associations were not required. Support for the second prediction was reported by Duss et  al. (2014). Unrelated word pairs (e.g., violin–lemon) were presented subliminally to amnesic patients and healthy controls. The amnesic patients had significantly poorer relational or associative encoding and retrieval than the controls. However, their encoding (and retrieval) of information about single

Corrected recognition

0.7 0.6 0.5 0.4

Controls fixed HC fixed Controls variable HC variable

0.3 0.2 0.1 0.0

Repeated

Novel

Tested viewpoint

Figure 7.16 Recognition memory for faces presented in a fixed or variable viewpoint and tested in a fixed or variable viewpoint; HC is a female amnesic patient. From Olson et al. (2015).

335

336 Memory

words (e.g., angler) was comparable to controls. Only the relational task involved hippocampal activation. Hannula and Greene (2012) discussed several studies showing associative or relational learning can occur without conscious awareness. Of most relevance here, however, is whether the hippocampus is activated during non-conscious encoding and retrieval. Henke et  al. (2003) presented participants with task–occupation pairs below the level of conscious awareness. There was hippocampal activation during nonconscious encoding of the face–occupation pairs. There was also hippocampal activation during non-conscious retrieval of occupations associated with faces. Finally, we turn to Henke’s third prediction, namely, that the hippocampus is not required for familiarity judgements in recognition memory. If so, we might predict amnesic patients should have intact familiarity judgements. As predicted, amnesics have intact recognition memory (including familiarity judgements) for unfamiliar faces (Bird, 2017; discussed earlier, p. 308). However, the findings with unfamiliar faces are unusual because patients generally have only reasonably (but not totally) intact familiarity judgements for other types of material (Bird, 2017; Bowles et  al., 2010; Skinner & Femandes, 2007) (discussed earlier, pp. 307–308). However, these findings may not be inconsistent with Henke’s (2010) model because amnesics’ brain damage often extends beyond the hippocampus to areas associated with familiarity (perirhinal cortex). A male amnesic patient (KN) with hippocampal damage but no perirhinal damage had intact familiarity performance (Aggleton et al., 2005). As shown in Figure 7.15, Henke (2010) assumed that familiarity judgements depend on activation in brain areas also involved in priming. As predicted, Thakral et  al. (2016) found similar brain areas were associated with familiarity and priming, suggesting they both involve similar processes.

Evaluation Henke’s (2010) model with its emphasis on memory processes rather than memory systems is an advance. We have considered several examples where predictions from her model have proved superior to predictions from the traditional approach. What are the model’s limitations? First, more research and theorising are needed to clarify the role of consciousness in memory. Conscious awareness is associated with integrated processing across several brain areas (Chapter 16) and so is likely to enhance learning and memory. However, how this happens is not specified. Second, the model resembles a framework rather than a model. For example, it is assumed the acquisition of semantic memories is sometimes closely related to episodic memory. However, we cannot make precise predictions unless we know the precise conditions determining when this is the case and how processes associated with semantic and episodic memory interact. Third, the model does not consider the brain networks associated with different types of memory (see below).

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 336

28/02/20 4:21 PM



Long-term memory systems

337

Does each memory system depend on a few brain areas? According to the traditional theoretical approach (see Figure 7.2), each memory system depends on only a few key brain areas (a similar assumption was made by Henke, 2010). Nowadays, however, it is generally assumed each type of memory involves several brain areas forming one or more networks. How can we explain the above theoretical shift? Early memory research relied heavily on findings from brain-damaged patients. Such findings (while valuable) are limited. They can indicate a given brain area is of major importance. However, neuroimaging research allows us to identify all brain areas associated with a given type of memory. Examples of the traditional approach’s limitations are discussed below. First, it was assumed that episodic memory depends primarily on the medial temporal lobe (especially the hippocampus). Neuroimaging research indicates that several other brain areas interconnected with the medial temporal lobe are also involved. In a review, Bastin et  al. (2019) concluded there is a general recollection network specific to episodic memory including the inferior parietal cortex, the medial prefrontal cortex and the posterior cingulate cortex. Kim and Voss (2019) assessed brain activity during the formation of episodic memories. They discovered that activation within large brain networks predicted subsequent ­recognition-memory performance (see Figure 7.17). Why did activation in certain areas predict lower recognition-­memory performance? The most important reason is that such activation often reflects various kinds of task-irrelevant processing. Task-positive Second, in the traditional approach (and Henke’s, 2010, model), autobiographical memories were regarded simply as Task-negative a form of episodic memory. However, the retrieval of autobiographical memories often involves more brain networks than the retrieval of simple episodic memories. As is shown Figure 7.17 in Figure 8.7, retrieval of autobiographical memories involves Brain areas whose activity during the fronto-­parietal network, the cingulo-­operculum network, the episodic learning predicted increased medial prefrontal cortex network and the medial temporal lobe recognition-memory performance network. Only the last of these networks is emphasised within the (task-positive; in red) or decreased performance (task-negative; in blue). traditional approach (and Henke’s model). Third, more brain areas are associated with semantic From Kim & Voss (2019). memory than the medial temporal lobes emphasised in the traditional model. In a meta-­analysis, Binder et  al. (2009) identified a left-­ hemisphere network consisting of seven regions including the middle temporal gyrus, dorsomedial prefrontal cortex and ventromedial prefrontal cortex. Fourth, it was assumed within the traditional approach that priming involves the neocortex. In fact, what is involved is more complex. Kim (2017a; discussed earlier, pp. 327–328) found in a meta-analysis that priming is associated with reduced activation in the fronto-parietal control

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 337

28/02/20 4:21 PM

338 Memory

network and the dorsal attention network but increased activation in the dorsolateral prefrontal cortex and related areas.

Are memory systems independent? A key feature of the traditional theoretical approach (see Figure 7.2) was the assumption that each memory system operates independently. As a consequence, any given memory task should typically involve only a single memory system. This assumption is an oversimplification. As Ferbinteanu (2019, p. 74) pointed out, “The lab conditions, where experiments are carefully designed to target specific types of memories, most likely do not universally apply in natural settings where different types of memories combine in fluid and complex manners to guide behaviour.” First, consider episodic and semantic memory. Earlier we considered cases where episodic and semantic memory were both involved. For example, people answering questions about repeated personal events (e.g., “Have you drunk coffee while shopping?”) rely on both episodic and semantic memory (Renoult et al., 2016). Second, consider skill learning and memory. Traditionally, it was assumed that skill learning depends primarily on implicit processes. However, as we saw earlier, explicit processes are often involved early in learning processes (Beukema & Verstynen, 2018; see Chapter 6).

Component-process models The traditional theoretical model is too neat and tidy: it assumes the nature of any given memory task rigidly determines the processes used. We need a theoretical approach assuming that memory processes are much more flexible than assumed within the traditional model (or Henke’s model). Dew and Cabeza (2011) proposed such an approach (see Figure 7.18). Five brain areas were identified varying along three dimensions: (1) cognitive process: perceptually or conceptually driven; (2) stimulus representation: item or relational; (3) level of intention: controlled vs. automatic. This approach is based on two major assumptions, which differ from those of previous approaches. First, there is considerable flexibility in the combination of processes (and associated brain areas) involved in the performance of any memory task. Second, “The brain regions operative during explicit or implicit memory do not divide on consciousness per se” (Dew & Cabeza, 2011, p. 185). Cabeza et  al. (2018) proposed a component-process model resembling that of Dew and Cabeza (2011). This model assumes that processing is very flexible and depends heavily on process-specific alliances (PSAs) or mini-­ networks. According to Cabeza et  al., “A PSA is a small team of brain regions that rapidly assemble to mediate a cognitive process in response to task demands but quickly disassemble when the process is no longer needed  . . . PSAs are flexible, temporary, and opportunistic” (p. 996).

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 338

28/02/20 4:21 PM



Ferbinteanu (2019) proposed a dynamic network model based on very similar assumptions. A major motivation for this theoretical approach was neuroimaging evidence. Here is an example involving the left angular gyrus in the parietal lobe. This region is involved in both the recollection of episodic memories and numerous tasks requiring semantic processing (see Figure 7.19). Moscovitch et  al. (2016) pointed out that the hippocampus’s connections to several other brain areas (e.g., those involved in visual perception) suggests it is not only involved in episodic memory. Consider research on boundary extension: “the  . . . tendency to reconstruct a scene with a larger background than actually was presented” (Moscovitch et  al., 2016, p. 121). Boundary extension is accompanied by hippocampal activation and is greatly reduced in amnesic patients with hippocampal damage. McCormick et  al. (2018) reviewed research on patients with damage to the hippocampus. Such patients mostly showed decreased future thinking and impaired scene construction, navigation and moral decision-making as well as impaired episodic memory. McCormick et  al. also reviewed research on patients with damage to the ventromedial prefrontal cortex (centrally involved in schema processing in semantic memory), which is also connected to several other brain areas. Such patients had decreased future thinking and impaired scene construction, navigation and emotion regulation.

Evaluation

339

Long-term memory systems

Figure 7.18 A three-dimensional model of memory: (1) conceptually or perceptually driven; (2) relational or item stimulus representation; (3) controlled or automatic/involuntary intention. The brain areas are the visual cortex (Vis Ctx), parahippocampal cortex (PHC), hippocampus (Hipp), rhinal cortex (RhC) and left ventrolateral prefrontal cortex (L VL PFC). From Dew and Cabeza (2011). © 2011 New York Academy of Sciences. Reprinted with permission of Wiley & Sons.

Example PSAs including L-AG Episodic recollection

Semantic processing

AG AG vATL HC

Figure 7.19 Process-specific alliances including the left angular gyrus (L-AG) are involved in recollection of episodic memories (left-hand side) and semantic processing (right-hand side).

The component-process approach has several strengths. First, there is compelling evidence that processes associated with From Cabeza et al. (2018). different memory systems combine very flexibly on numerous memory tasks. This flexibility depends on the precise task demands (e.g., processes necessary early in learning may be less so subsequently) and on individual differences in learning/memory skills and previous knowledge. In other words, we use whatever processes (and associated brain areas) are most useful for the current learning or memory task.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 339

28/02/20 4:21 PM

340 Memory

KEY TERM Boundary extension Misremembering a scene as having a larger surround area than was actually the case.

Second, this approach is more consistent with the neuroimaging evidence than previous approaches. It can account for the fact that many more brain areas are typically active during most memory tasks than expected from the traditional approach. Third, the component-process approach has encouraged researchers to abandon the traditional approach of studying memory as an isolated mental function. For example, processes associated with episodic memory are also involved in scene construction, aspects of decision-making, navigation, imagining the future and empathy (McCormick et  al., 2018; Moscovitch et  al., 2016). More generally, “The border between memory and perception/action has become more blurred” (Ferbinteanu, 2019, p. 74). What are the limitations of the component-process approach? First, it does not provide a detailed model. This makes it hard to make specific predictions concerning the precise combination of processes individuals will use on any given memory task. Second, our ability to create process-specific alliances rapidly and efficiently undoubtedly depends on our previous experiences and various forms of learning (Ferbinteanu, 2019). However, the nature of such learning remains unclear. Third, as Moscovitch et  al. (2016, p. 125) pointed out, “Given that PSAs are rapidly assembled and disassembled, they require a mechanism that can quickly control communication between distant brain regions.” Moscovitch et al. argued the prefrontal cortex is centrally involved, but we have very limited evidence concerning its functioning. Fourth, process-specific alliances are typically mini-networks involving two or three brain regions. However, as we have seen, some research has suggested the involvement of larger brain networks consisting of numerous brain regions (e.g., Kim & Voss, 2019). The optimal network size for explaining learning and memory remains unclear.

CHAPTER SUMMARY •

Introduction. The notion there are several memory systems is very influential. Within that approach, the crucial distinction is between declarative memory (involving conscious recollection) and nondeclarative memory (not involving conscious recollection). This distinction has received strong support from amnesic patients with severely impaired declarative memory but almost intact non-­ declarative memory. Declarative memory is divided into semantic and episodic/autobiographical memory, whereas non-declarative memory is divided into priming and skill learning or procedural memory.



Declarative memory. Evidence from patients supports the distinction between episodic and semantic memory. Amnesic patients with damage to the medial temporal lobes including the hippocampus typically have more extensive impairment of episodic than semantic memory. In contrast, patients with semantic dementia (involving damage to the anterior temporal lobes) have

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 340

28/02/20 4:21 PM



Long-term memory systems

341

more extensive impairment of semantic than episodic memory. However, a complicating factor is that many memory tasks involve combining episodic and semantic memory processes. Another complicating factor is semanticisation (transformation of episodic memories into semantic ones over time): perceptual details within episodic memory are lost over time and there is increased reliance on gist and schematic information within semantic memory. •

Episodic memory. Episodic memory is often assessed by recognition tests. Recognition memory can involve familiarity or recollection. Evidence supports the binding-of-item-and-context model: familiarity judgements depend on perirhinal cortex whereas recollection judgements depend on binding what and where information in the hippocampus. In similar fashion, free recall can involve familiarity or recollection with the latter being associated with better recall of contextual information. Episodic memory is basically constructive rather than reproductive, and so we remember mostly the gist of our past experiences. Constructive processes associated with episodic memory are used to imagine future events. However, imaging future events relies more heavily on semantic memory than does recalling past events. Episodic memory is also used in divergent creative thinking.



Semantic memory. Most objects can be described at the superordinate, basic and subordinate levels. Basic level categories are typically used in everyday life. However, categorisation is often faster at the superordinate level than the basic level because less information processing is required. According to Barsalou’s situated simulation theory, concept processing involves perceptual and motor information. However, it is unclear whether perceptual and motor information are both necessary and sufficient for concept understanding (e.g., patients with damage to the motor system can understand action-related words). Concepts have an abstract central core of meaning de-emphasised by Barsalou. According to the hub-and-spoke model, concepts consist of hubs (unified abstract representations) and spokes (modalityspecific information). The existence of patients with categoryspecific deficits supports the notion of spokes. Evidence from patients with semantic dementia indicates hubs are stored in the anterior temporal lobes. It is unclear how information from hubs and spokes is combined and integrated.   Schemas are stored in semantic memory with the ventromedial prefrontal cortex being especially involved in schema processing. Patients with damage to that brain area often have greater impairments in schema knowledge than concept knowledge. In contrast, patients with semantic dementia (damage to the anterior temporal lobes) have greater impairments in concept knowledge than schema knowledge. Thus, there is some evidence for a double dissociation.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 341

28/02/20 4:21 PM

342 Memory



Non-declarative memory. Priming is tied to specific stimuli and occurs rapidly. Priming often depends on enhanced neural efficiency shown by repetition suppression of brain activity. Skill learning occurs slowly and generalises to stimuli not presented during learning. Amnesic patients (with hippocampal damage) typically have fairly intact performance on priming and skill learning but severely impaired declarative memory. In contrast, Parkinson’s patients (with striatal damage) exhibit the opposite pattern. Amnesic and Parkinson’s patients provide only an approximate double dissociation. Complications arise because some tasks can be performed using either declarative or non-declarative memory, because different memory systems sometimes interact during learning, and because non-declarative learning often involves networks consisting of several brain areas.



Beyond memory systems and declarative vs non-declarative memory. The traditional emphasis on the distinction between declarative and non-declarative memory is oversimplified. It does not fully explain amnesics’ memory deficits and exaggerates the relevance of whether processing is conscious or not. Henke’s model (with its emphasis on processes rather than memory systems) provides an account that is superior to the traditional approach. According to the component-process model, memory involves numerous brain areas and processes used in flexible combinations rather than a much smaller number of rigid memory systems. This model has great potential. However, it is hard to make specific predictions about the combinations of processes individuals will use on any given memory task.

FURTHER READING Baddeley, A.D., Eysenck, M.W. & Anderson, M.C. (2020). Memory (3rd edn). Abingdon, Oxon.: Psychology Press. Several chapters are of direct relevance to the topics covered in this chapter. Bastin, C., Besson, G., Simon, J., Delhaye, E., Geurten, M., Willems, S., (2019). An integrative memory model of recollection and familiarity to understand memory deficits. Behavioral and Brain Sciences, 1–66 (epub: 5 February 2019). Christine Bastin and colleagues provide a comprehensive theoretical account of episodic memory. Cabeza, R., Stanley, M.L. & Moscovitch, M. (2018). Process-specific alliances (PSAs) in cognitive neuroscience. Trends in Cognitive Sciences, 22, 996–1010. Roberto Cabeza and colleagues how cognitive processes (including memory) depend on flexible interactions among brain regions. Ferbinteanu, J. (2019). Memory systems 2018 – Towards a new paradigm. Neurobiology of Learning and Memory, 157, 61–78. Janina Ferbinteanu discusses recent theoretical developments in our understanding of memory systems. Kim, H. (2017). Brain regions that show repetition suppression and enhancement: A meta-analysis of 137 neuroimaging experiments. Human Brain Mapping, 38, 1894–1913. Hongkeun Kim discusses the processes underlying repetition priming with reference to a meta-analysis of the relevant brain areas.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 342

28/02/20 4:21 PM



Long-term memory systems

343

Lambon Ralph, M.A., Jefferies, E., Patterson, K. & Rogers, T.T. (2017). The neural and computational bases of semantic cognition. Nature Reviews Neuroscience, 18, 42–55. Our current knowledge and understanding of semantic memory are discussed in the context of the hub-and-spoke model. Verfaillie, M. & Keane, M.M. (2017). Neuropsychological investigations of human amnesia: Insights into the role of the medial temporal lobes in cognition. Journal of the International Neuropsychological Society, 23, 732–740. Research on amnesia and memory is discussed in detail in this article. Yee, E., Jones, M.N. & McRae, K. (2018). Semantic memory. In S.L. ThompsonSchill & J.T. Wixted (eds), Stevens’ Handbook of Experimental Psychology and Cognitive Neuroscience, Vol. 3: Language and Thought: Developmental and social psychology (4th edn; pp. 319–356). New York: Wiley. This chapter provides a comprehensive account of theory and research on semantic memory.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 343

28/02/20 4:21 PM

Chapter

8

Everyday memory

INTRODUCTION Most memory research discussed in Chapters 6 and 7 was laboratory-based but nevertheless of reasonably direct relevance to how we use memory in our everyday lives. In this chapter, we focus on topics rarely researched until approximately 50 years ago but arguably even more directly relevant to our everyday lives. Two such topics are autobiographical memory and prospective memory, which are both strongly influenced by our everyday goals and motives. This is very clear with prospective memory (remembering to carry out intended actions). Our intended actions assist us to achieve our current goals. For example, if you have agreed to meet a friend at 10 am, you need to remember to set off at the appropriate time to achieve that goal. The other main topic discussed in this chapter is eyewitness testimony. Such research has obvious applied value with respect to the judicial system. However, most research on eyewitness testimony has been conducted in laboratory settings. Thus, it would be wrong to distinguish sharply between laboratory research and everyday memory or applied research. In spite of what has been said so far, everyday memory sometimes differs from more traditional memory research in various ways. First, social factors are often important in everyday memory (e.g., a group of friends discuss some event or holiday they have shared together). In contrast, participants in traditional memory research typically learn and remember information on their own. Second, participants in traditional memory experiments are generally motivated to be as accurate as possible. In contrast, everyday memory research is typically based on the notion that “Remembering is a form of purposeful action” (Neisser, 1996, p. 204). This approach involves three assumptions about everyday memory: (1) It is purposeful (i.e., motivated). (2) It has a personal quality about it, meaning it is influenced by the individual’s personality and other characteristics.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 344

28/02/20 4:21 PM



345

Everyday memory

(3) It is influenced by situational demands (e.g., the wish to impress one’s audience). The essence of Neisser’s (1996) argument is this: what we remember in everyday life is determined by our personal goals, whereas what we remember in traditional memory research is mostly determined by the experimenter’s demands for accuracy. Sometimes we strive for maximal memory accuracy in our everyday life (e.g., during an examination), but that is ­typically not our main goal.

KEY TERM Saying-is-believing effect Tailoring a message about an event to suit a given audience causes subsequent inaccuracies in memory for that event.

Findings Evidence that the memories we report in everyday life are sometimes deliberately distorted was reported by Brown et  al. (2015). They found 58% of students admitted to having “borrowed” other people’s personal memories when describing experiences that had allegedly happened to them. This was often done to entertain or impress an audience. If what you say about an event is deliberately distorted, does this change the memory itself? It often does. Dudokovic et  al. (2004) asked people to recall a story accurately (as in traditional memory research) or entertainingly (as in the real world). Unsurprisingly, entertaining retellings were more emotional but contained fewer details. The participants were then instructed to recall the story accurately. Those who had previously recalled it entertainingly recalled fewer details and were less accurate than those who previously recalled it accurately. This exemplifies the saying-is-believing effect – tailoring what one says about an event to suit a given audience causes inaccuracies in memory for that event. Further evidence of the saying-is-believing effect was reported by Hellmann et  al. (2011). Participants saw a video of a pub brawl involving two men. They then described the brawl to a student having previously been told this student believed person A was (or was not) the culprit. The participants’ retelling of the event reflected the student’s biased views. On a subsequent unexpected test of free recall for the crime event, participants’ recall was systematically influenced by their earlier retelling. Free recall was most distorted in those participants whose retelling of the event had been most biased.

What should be done? Research on human memory should ideally possess ecological validity (i.e., applicability to real life; see Glossary). Ecological validity has two aspects: (1) representativeness (the naturalness of the experimental situation and task); and (2) generalisability (the extent to which a study’s findings apply to the real world). It is often (mistakenly) assumed that everyday memory research has greater ecological validity than traditional laboratory research. This is simply incorrect. Generalisability is more important than representativeness (Kvavilashvili & Ellis, 2004). Laboratory research is generally carried out under well-controlled conditions and very often produces findings that

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 345

28/02/20 4:21 PM

346 Memory

KEY TERMS Autobiographical memory Long-term memory for the events of one’s own life. Mentalising The ability to perceive and interpret behaviour in terms of mental states (e.g., goals; needs).

apply to the real world. Indeed, the fact that the level of experimental control is generally higher in laboratory research than in more naturalistic research means that the findings obtained often have greater generalisability. Laboratory research also often satisfies the criterion of representativeness because the experimental situation captures key features of the real world. In sum, the distinction between traditional laboratory research and everyday memory research is blurred and indistinct. In practice, there is much cross-fertilisation, with the insights from both kinds of memory research enhancing our understanding of human memory.

AUTOBIOGRAPHICAL MEMORY: INTRODUCTION We have hundreds of thousands of memories relating to an endless variety of things. However, those relating to the experiences we have had and those of other people important to us have special significance and form our autobiographical memory (memory for the events of one’s own life). What is the relationship between autobiographical memory and episodic memory (concerned with events at a given time in a specific place; see Chapter 7)? One important similarity is that both types of memory relate to personally experienced events. In addition, both are susceptible to proactive and retroactive interference and unusual or distinctive events are especially well remembered. There are also several differences between them. First, autobiographical memory typically relates to events of personal significance whereas episodic memory (sometimes called “laboratory memory”) often relates to trivial events (e.g., was the word chair presented in the first list?). As a consequence, autobiographical memories are often thought about more often than episodic ones. They also tend to be more organised than ­episodic memories because they relate to the self. Second, neuroimaging evidence suggests autobiographical memory is more complex and involves more brain regions than episodic memory. Andrews-Hanna et  al. (2014) carried out a meta-analysis (see Glossary) of studies on autobiographical memory, episodic memory and mentalising (understanding the mental states of oneself and others) (see Figure 8.1). Episodic memory retrieval involved medial temporal regions (including the hippocampus) whereas mentalising involved the dorsal medial regions (including the dorsal medial prefrontal cortex). Of most importance, the brain regions associated with autobiographical memory overlapped with those associated with episodic memory and mentalising. Thus, autobiographical memory seems to involve both episodic memory and ­ mentalising. Third, some people have large discrepancies between their auto­ biographical and episodic memory (Roediger & McDermott, 2013). For example, Patihis et  al. (2013) found individuals with exceptionally good autobiographical memory had only average episodic memory performance when recalling information learned under laboratory conditions (see below). Fourth, the role of motivation differs between autobiographical and episodic memory (Marsh & Roediger, 2012). We are much more interested in our own personal history than episodic memories formed in the

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 346

28/02/20 4:21 PM



Everyday memory

347

laboratory. In addition, as mentioned earlier, we are motivated to recall autobiographical memories reflecting well on ourselves. In contrast, we are motivated to recall laboratory episodic memories accurately. Fifth, some aspects of autobiographical memory involve semantic memory (general knowledge; see Glossary) rather than episodic memory (Prebble et  al., 2013). For example, we know where and when we were born but this is not based on episodic memory! Further evidence for the involvement of semantic memory in autobiographical memory comes from research on amnesic Figure 8.1 patients (Juskenaite et  al., 2016). They have Brain regions activated by autobiographical, episodic retrieval little or no episodic memory but can never- and mentalising tasks including regions of episodic (green); theless recall much information about them- mentalising (blue); autobiographical (red-brown); episodic + selves (e.g., aspects of their own personality). mentalising (blue/green); episodic + autobiographical (yellow); Eustache et  al. (2016) distinguished mentalising + autobiographical (purple); all 3 (white). between episodic and ­ semantic autobio- From Andrews-Hanna et al. Reprinted with permission of Elsevier. graphical memory. Both forms of autobiographical memory involve personal memories, but the latter differ from the former because they lack any subjective sense of recollection. Eustache et  al. reviewed neuroimaging research supporting the above distinction. Episodic autobiographical memory was associated with activation in the occipital cortex and lateral parietal cortex. In contrast, semantic autobiographical memory was associated with activation in the middle and inferior frontal cortex. Other brain areas (e.g., the lateral temporal cortex; the hippocampus) were activated by both forms of autobiographical memory. More research indicating that autobiographical memories vary in their relationship to episodic and semantic memory is discussed in Chapter 7 (e.g., research of Renoult et al., 2016; see p. 303). What are the main functions of autobiographical memory? Bluck and Alea (2009) identified three key reasons: (1) social function: bonding with others (e.g., shared memories); (2) directive function: using the past as a guide to the future; (3) self-function: creating a sense of self-continuity over time. Vranić et  al. (2018) obtained support for all three functions in a questionnaire-­based approach. The social and self-functions were positively correlated with each other and there was some evidence these functions were more important than the directive function. Demiray and Janssen (2013) identified an additional function: self-­ enhancement. Most people feel closer to their positive memories than their negative ones, and this effect is stronger among individuals having high self-esteem. Below we discuss major topics within autobiographical memory. First, we consider unusually vivid autobiographical memories for dramatic personal or world events. Second, we focus on those periods in individuals’

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 347

28/02/20 4:21 PM

348 Memory

lives from which disproportionately many or few autobiographical memories are retrieved. Third, we discuss major theoretical approaches. Note that research on autobiographical memories for traumatic childhood events is discussed in Chapter 7.

IN THE REAL WORLD: HIGHLY SUPERIOR ­AUTOBIOGRAPHICAL MEMORY (HSAM) Many people bemoan their deficient autobiographical memories. However, a few individuals have remarkably efficient autobiographical memory. Consider Jill Price (see photo). She has an incredible ability to recall detailed information about almost every day of her life and thus possesses what is known as highly superior autobiographical memory (HSAM). You may envy Jill Price’s phenomenal autobiographical memory. However, she regards it as a disadvantage: “I call it a burden. I run my entire life through my head every day and it drives me crazy!!!” (Parker et  al., 2006, p. 35). Strangely, her memory generally is very ordinary (e.g., recalling word lists). You can see Jill Price on YouTube: “The Woman Who Could Not Forget – Jill Price”. Why is her autobiographical memory so outstanding? First, she has obsessional tendencies and focuses excessively on her personal past. As she said, “This is OCD [obsessive-compulsive disorder]. I have OCD of my memories.” Second, she has poor inhibitory processes Jill Price and so finds it very hard to switch off her personal memories. Third, Dan Tuffs/Getty Images. she makes time seem more concrete by representing it in spatial form (e.g., positions on a circle). More recent research (e.g., LePort et al., 2012, 2016; Santangelo et al., 2018) indicates the great majority of individuals with HSAM possess similar obsessional characteristics to Jill Price. Indeed, they often have as many obsessional symptoms as patients with obsessive-compulsive disorder.

From LePort et al. (2016).

60

Controls

*

HSAMs

* # Internal details

Figure 8.2 Number of internal details (those specific to an autobiographical event) recalled at various time delays (by controls and individuals with highly superior autobiographical memory (HSAM)).

*

40 +

+

+

20

+

+

+

0 1 week

1 month

1 year

10 years

Delay

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 348

28/02/20 4:21 PM



349

Everyday memory

Recent research also indicates the performance of those with HSAM is only average on standard laboratory memory tasks. LePort et  al. (2016) found individuals with HSAM had comparable autobiographical memory to controls one week after an event. However, they were dramatically better than controls thereafter (see Figure 8.2). These findings suggest the memory differences between the two groups depended mainly on processes occurring after acquisition (e.g., consolidation; frequent rehearsal) rather than encoding at the time of the event. Santangelo et  al. (2018) found that individuals with HSAM retrieved autobiographical memories (but not other memories) much faster than controls. During retrieval of autobiographical memories, twice as many brain areas were activated in HSAM individuals as controls and they had enhanced connectivity between brain areas important in memory retrieval. Some individuals with HSAM may have brains differing from those of other people (Palombo et  al., 2018). LePort et  al. (2012) found that HK (a man with HSAM) had a larger right amygdala than most people and enhanced connectivity between the amygdala and hippocampus. This could be important because the amygdala is involved in emotional processing and the hippocampus is crucial to forming long-term memories. However, such brain differences may be a consequence (rather than cause) of remarkable autobiographical memory.

Flashbulb memories Most people believe they have extremely clear and long-lasting memories for their personal experiences following important and dramatic public events (e.g., the terrorist attacks on the United States on 11 September 2001). Such memories were termed flashbulb memories by Brown and Kulik (1977). They claimed dramatic events perceived as surprising and as having real consequences for the individual (making them of relevance to autobiographical memory) activate a special neural mechanism which “prints” the details of such events permanently in memory. Brown and Kulik (1977) argued the following information is typically included in flashbulb memories: ●● ●● ●● ●● ●● ●●

informant (person who supplied the information); place where the news was heard; ongoing event; individual’s own emotional state; emotional state of others; consequences of the event for the individual.

KEY TERMS Highly superior autobiographical memory (HSAM) Exceptional ability to recall autobiographical memories in detail, generally accompanied by only average ability to recall other memories. Flashbulb memories Vivid and detailed personal memories of dramatic events (e.g., 9/11).

Findings Sharot et  al. (2007) compared the memories of individuals close to the World Trade Centre (about 2 miles) on 9/11 with those somewhat further away (about 4½ miles) three years afterwards. The flashbulb memories of those close to the event were more vivid and detailed and involved more activation of the amygdala (strongly involved in emotion). These findings suggest it may require intense emotional experience to produce genuine flashbulb memories.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 349

World Trade Center attacks on 9/11. Tammy KLEIN/Gamma-Rapho via Getty Images.

28/02/20 4:21 PM

350 Memory

KEY TERM Flashbacks Intense emotional memories of traumatic events that are recalled involuntarily by patients suffering from posttraumatic stress disorder.

Support for the involvement of the amygdala was reported by Spanel et al. (2018). Recall of flashbulb memories was much worse in patients with damage to the amygdala than those without damage to that brain area. Flashbulb memories not based on an intense emotional experience are often surprisingly inaccurate. For example, videotape of the first plane striking the first tower on 9/11 was not available on the day it happened. However, 73% of those questioned said they had seen it on that day (Pezdek, 2003)! Their memories were distorted because videotape of the second tower being hit was available on the day itself. Hirst et  al. (2015) studied flashbulb memories and event memories (memories for facts associated with events causing flashbulb memories) of 9/11 over a 10-year period. There was rapid forgetting for both types of memories within the first year after 9/11 but very little thereafter. Of interest, participants had very high confidence in the accuracy of their flashbulb memories despite considerable forgetting. Rimmele et  al. (2012) studied the consistency of flashbulb memories (i.e., lack of change) over time. There was high consistency between one week and three years after 9/11 for remembering the location at which participants heard about the event (83%), but lower consistency for informant (70%), ongoing activity (62%) and their own immediate reaction (34%). In spite of much inconsistency in individuals’ flashbulb memories, these memories are generally associated with high confidence levels. Talarico and Rubin (2003) found flashbulb memories for 9/11 showed no more consistency over a 32-week period than did everyday memories but the reported vividness of flashbulb memories was much greater. Why are confidence levels so high? Day and Ross (2014) assessed flashbulb memories for Michael Jackson’s death. Participants having a strong social bond with Michael Jackson had greater confidence in the accuracy of their flashbulb memories than those with a weak social bond, because they experienced Jackson’s death with greater emotional intensity and also rehearsed the event more often. However, memory consistency was not influenced by social bond, emotional intensity or rehearsal.

Conclusions

Interactive exercise: Flashbulb memories

Most findings suggest flashbulb memories are not special except perhaps when their formation is associated with high emotion. Most flashbulb memories exhibit forgetting and/or distortions resembling those found with ordinary memories (Hirst & Phelps, 2016). However, such memories may be more detailed and long-lasting if the relevant event directly affected their lives (Sharot et al., 2007; see Chapter 15). Flashbulb memories are associated with excessively high levels of confidence in their accuracy for various reasons (e.g., the intensity of emotional experience involved; rehearsal: Day & Ross, 2014). Excellent memory for the location at which individuals heard about the traumatic event may cause them to exaggerate the accuracy of their flashbulb memories (Rimmele et al., 2012). Finally, there are interesting links between flashbulb memories and flashbacks (“the intrusive re-experiencing of traumatic experiences in the present”: Brewin, 2015, p. 1). Healthy individuals viewing a trauma film are

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 350

28/02/20 4:21 PM



most likely to experience flashbacks subsequently if the amygdala (involved in emotional processing) and areas within the occipital cortex involved in imagery are activated (James et al., 2016). With such research, it is possible to assess individuals’ immediate cognitive and emotional reactions to the traumatic event, which cannot be done when studying flashbulb memories.

MEMORIES ACROSS THE LIFETIME Suppose we ask 70-year-olds to recall personal memories suggested by cue words (e.g., nouns referring to common objects). From which points in their lives would most memories come? Rubin et  al. (1986) answered this question by combining findings from several studies. Two findings were of theoretical interest: ●●

●●

351

Everyday memory

KEY TERMS Infantile amnesia The inability of adults to recall autobiographical memories from early childhood; also known as childhood amnesia. Reminiscence bump The tendency of older people to recall a disproportionate number of autobiographical memories from adolescence and early adulthood.

Infantile amnesia (or childhood amnesia) shown by the almost total

lack of memories from the first three years of life. Reminiscence bump, consisting of a surprisingly large number of memories coming from the years between 10 and 30 (especially between 15 and 25).

Infantile amnesia Adults sometimes claim their first autobiographical memory dates back to 2 years of age or earlier but such memories are typically fictional (Akhtar et al., 2018). Adults’ genuine first memories rarely date back to earlier than 2½ or 3 years of age and they also show limited recall for events occurring between 3 and 6 (see Figure 8.3). How can we explain this phenomenon (infantile amnesia or childhood amnesia)? Freud famously (notoriously?) attributed it to repression, with threat-related thoughts and experiences being consigned to the unconscious (see Chapter 6). This dramatic theory does not explain why adults cannot remember positive and neutral events from early childhood.

Psychological theories Howe and Courage (1997) argued the development of the cognitive self (self-awareness) occurs during the second year of life. This plays an important role in the end of infantile amnesia and the onset of autobiographical memory. The reason is that possession of a cognitive self provides a framework for the organisation of autobiographical memories. The social-cultural developmental theory (e.g., Fivush, 2010) provides an alternative account, according to which language and culture are both central to autobiographical memory development. Language is important because we use it to communicate our memories. Experiences occurring before children develop language are hard to express in language later on. Evidence indicating the importance of language was reported by Jack et al. (2009): the age of first recalled memory was earlier in adolescents whose mothers reminisced elaborately about the past with their children. Jack and Hayne (2010) argued the common assumption of a gradual decline in infantile amnesia is incorrect. In their study, adults’

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 351

28/02/20 4:21 PM

352 Memory

earliest memory dated from 23 months of age. However, their memories for the first 4–6 years of life were sparse. These findings suggest infantile amnesia is a two-stage process: (1) absolute amnesia for the first two years of life; and (2) relative amnesia for the remaining preschool years. How can we account for these two stages? According to Jack and Hayne (2010), absolute amnesia ends with the onset of the cognitive self (consistent with Howe and Courage’s theory). The subsequent strong tendency for information recalled about childhood events to increase as the individual’s age at the time increases probably reflects children’s rapid development of language in early life (consistent with Fivush’s theory).

Hippocampal neurogenesis Infantile amnesia has been observed in all altricial species (those showing considerable post-birth development). Such infantile amnesia cannot be explained with reference to notions such as the cognitive self or language development. However, it can potentially be From Josselyn and Frankland (2012). © 2012 Cold Spring Harbor Laboratory Press. Reproduced with permission of author and Cold explained by processes occurring within the Spring Harbor Laboratory Press. hippocampus (crucially involved in declarative memory including autobiographical memory). We need to focus on hippocampal neurogenesis, a process in which new neurons are generated within the hippocampus (especially the dentate gyrus) early in development. According to Josselyn and Frankland (2012, p. 423), “High neurogenesis levels negatively regulate the ability to form enduring memories, most likely by replacing synaptic connections in pre-existing hippocampal memory circuits.” Madsen and Kim (2016) reviewed evidence indicating the importance of hippocampal neurogenesis in producing infantile amnesia. For example, long-term retrieval in mice was impaired when drugs increased hippocampal neurogenesis. In contrast, long-term retrieval was enhanced when drugs reduced hippocampal neurogenesis. Travaglia et  al. (2016) found rats during the infantile amnesia period formed lasting (but relatively inaccessible) memories. However, when activity in the hippocampus was blocked prior to learning, such memories were not acquired. Finally, Travaglia et  al. showed that changing patterns of activation in the hippocampus KEY TERM ­signalled the end of the infantile amnesia period. Figure 8.3 Childhood amnesia based on data reported by Rubin and Schulkind (1997). Participants (20, 35 and 70 years of age) reported very few autobiographical memories before the age of 3 and there was later a levelling off between the ages of 7 and 10.

Hippocampal neurogenesis The process of generating new neurons in the hippocampus during early development.

Forgetting Defining “infantile amnesia” on the basis of adults’ inability to recall auto­ biographical memories from the earliest years of life can lead to the erroneous

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 352

28/02/20 4:21 PM



353

Everyday memory

assumption that young children cannot form autobiographical memories. A simple explanation of infantile amnesia is that young children  form auto­ biographical memories but these memories are very susceptible to forgetting. Supporting evidence was reported by Tustin and Hayne (2016). Threeyear-old children learned how to operate a train and their memory for this event was tested after 1 day and 1 year. They exhibited accurate memory (including verbal autobiographical memory) on both tests and there was no effect of retention interval. However, the children’s memories of the event contained only a few details which may help to explain why most early memories cannot be recalled by adults.

Overall evaluation Infantile amnesia depends on several factors (see Howe, 2019, for a review). Absolute amnesia can probably be explained by hippocampal neurogenesis. After that, the onset of autobiographical memory in infants probably depends on reductions in hippocampal neurogenesis plus the emergence of the cognitive self. Its subsequent expression depends heavily on social and cultural factors and children’s language development and possibly also on their development of semantic memory. What are the limitations of research in this area? First, most research has focused on adults’ inability to recall autobiographical memories from the first three years of life. It is generally unclear whether this inability is due to severely deficient initial encoding of such memories, to difficulties in retrieval or both. Second, most research is correlational making it hard to establish causality (e.g., the finding that the end of infantile amnesia occurs around the time the cognitive self emerges does not prove the latter causes the former).

Reminiscence bump As mentioned earlier, older people asked to produce personal memories recall numerous events from adolescence and early childhood (the reminiscence bump). Conway et  al. (2005) found a reminiscence bump in older individuals in five countries (America, China, Japan, England and Bangladesh). Of interest, the Chinese (with a collectivistic culture emphasising group cohesion) were most likely to recall events with a social or group orientation. In contrast, the Americans (with an individualistic culture emphasising personal responsibility and achievement) were most likely to recall events relating to themselves. It has typically been assumed (incorrectly) there is a single reminiscence bump. Koppel & Berntsen (2015) carried out a meta-analysis using two techniques to assess the reminiscence bump:

Interactive exercise: Reminiscence bump

(1) cue-word method in which individuals generate memories to cue words; (2) the important memories method in which individuals report i­mportant personal memories. Their key findings were as follows (see Figure 8.4). First, the midpoint of the reminiscence bump was 15.5 years using cue words but 21.5 for important

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 353

28/02/20 4:21 PM

354 Memory

From Koppel & Berntsen (2015). Reprinted with permission of Elsevier.

(a) 20

Percentage of memories

Figure 8.4 Temporal distribution of autobiographical memories across the lifespan. (a) Top panel: word-cued memories; (b) bottom panel: important memories.

15

10

5

0 –7

5 –6

66

0 –6

61

5 –5

56

0

51

5

–5 46

0

–4 41

5

–4 36

0

–3 31

5

–3 26

0

–2 20

5

–2 16

–1

10

11

6–

0–

5

0

Age in years at time of event

Percentage of memories

(b) 20

15

10

5

0 –7

5 –6

66

0 –6

61

5 –5

56

0

51

5

–5 46

0

–4 41

5

–4 36

0

–3 31

5

–3 26

0

–2 20

–2 16

5 –1

10

11

6–

0–

5

0

Age in years at time of event

KEY TERM Life script A schema based on cultural expectations concerning the nature and order of a typical person’s major life events.

memories. Second, the reminiscence bump was much stronger using the important memories method. How can we explain the reminiscence bump(s)? One influential approach is Rubin and Berntsen’s (2003) theory based on the notion of a life script (cultural expectations about the major life events in most people’s lives). Examples include falling in love, marriage and having children. Most such events occur between the ages of 15 and 30. According to the theory, the life script guides and organises the retrieval of autobiographical memories. Several predictions from the life-script account have been supported. First, most life events are emotionally positive, and so we would expect to find a reminiscence bump only for positive events. That is precisely what Berntsen et al. (2011) found. As expected, the positive events recalled were rated as much more central to the participants’ life story than the negative ones. Second, there was no reminiscence bump for positive events not forming part of the life script (Berntsen et  al.). Third, Scherman (2013)

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 354

28/02/20 4:21 PM



Everyday memory

355

found life scripts had a lifespan distribution resembling the reminiscence bump in four countries (Denmark, USA, Turkey and the Netherlands). Most positive events forming part of the life script involve major transitions (e.g., going to college; marriage; having children). Evidence transitions not directly forming part of the life script are important were reported by Enz et al. (2016). Older adults recalled autobiographical events occurring between the ages of 40 and 60. Many events recalled occurred close in time to the major transition of a residential move: a relocation bump. Thus, autobiographical memories associated with transitions (even if not part of the life script) seem to be especially easy to recall, perhaps because such memories tend to be novel and distinctive. Why does the reminiscence bump depend on the method used? It has been argued (Koppel & Berntsen, 2016) that the crucial difference between the cue word and important memories methods is the retrieval strategy used. Koppel and Berntsen (2016) asked students (mean age = 23) to generate the autobiographical memories they imagined a hypothetical 70-yearold would produce. With the important memories method, the timing and nature of the reminiscence bump were strikingly similar for imagined important memories and those of actual 70-year-olds. Similar (but much less striking) findings were obtained when comparing imagined and actual memories using the cue-word method. Koppel and Berntsen (2016) concluded the different reminiscence bumps produced using the two methods “are largely produced by general schematic ­processes operative at retrieval” (p. 97). In sum, the reminiscence bump produced using the important memories method depends on the life script and its associated cultural expectations. In addition, it is probably relevant that major life events generally involve important transitions. In contrast, memories recalled using the cue-word method are much less influenced by the life script (Koppel & Berntsen, 2015). The finding (Koppel & Berntsen, 2016) that imagined memories differed substantially from actual ones using that method suggests specifically memory-based processes underlie the reminiscence bump associated with that method.

THEORETICAL APPROACHES TO AUTOBIOGRAPHICAL MEMORY Many theories of autobiographical memory have been proposed over the years. Here we will focus mainly on Conway and Pleydell-Pearce’s (2000) self-memory system model and its subsequent development. Then, we discuss how cognitive neuroscience has contributed to our understanding of autobiographical memory.

Research activity: Memory for personal events

Self-memory system model Conway and Pleydell-Pearce (2000) argued we possess a self-memory system having two major components: (1) Autobiographical memory knowledge base: this contains personal information at three levels of specificity:

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 355

28/02/20 4:21 PM

356 Memory

KEY TERMS Generative retrieval Deliberate or voluntary construction of autobiographical memories based on an individual’s current goals; see direct retrieval. Direct retrieval Effortless recall of autobiographical memories triggered by a specific cue (e.g., being in the same place as the original event); see generative retrieval.

●●

●●

●●

Lifetime periods: they are defined by major ongoing events and generally cover substantial periods of time (mean length between 4 and 15 years: Thomsen, 2015). Different lifetime periods often overlap in time (e.g., living with someone may overlap with having a particular job). General events: these include repeated events (e.g., visits to a sports club) and single events (e.g., a holiday in Botswana). General events are often related to each other and to lifetime periods. Event-specific knowledge: this consists of images, feelings and other details relating to general events and spanning time periods from seconds to hours. Event knowledge is usually organised in the correct temporal order.

(2) Working self: this is concerned with the self, what it may become and the individual’s current goals. The working self’s goals influence the memories stored within the autobiographical memory knowledge base and the autobiographical memories we recall. As a result, “Autobiographical memories are primarily records of success or failure in goal attainment” (Conway & Pleydell-Pearce, 2000, p. 266). According to the theory, autobiographical memories can be accessed in two ways. First, there is generative retrieval which involves deliberately constructing autobiographical memories by applying the working self to information in the autobiographical memory knowledge base. Second, there is direct retrieval: autobiographical memories are triggered effortlessly or “automatically” by specific cues (e.g., hearing the word Paris may trigger retrieval of a holiday there). It was predicted recalled autobiographical memories would mostly be goal-relevant regardless of retrieval mode. However, events relating to current goals are more likely to be recalled with generative retrieval (which involves top-down processes) than with direct retrieval (which typically depends on bottom-up processes triggered by environmental cues). Conway (2005) developed the above theory (see Figure 8.5). The knowledge structures divided into the conceptual self and episodic memories (previously called event-specific knowledge). At the top of the hierarchy, the life story and themes have been added. The life story consists of very general factual and evaluative knowledge we possess about ourselves and themes referring to major life domains (e.g., work; relationships). Conway (2005) argued we want our autobiographical memories to exhibit coherence (consistency with our current goals and beliefs). However, we also often want them to exhibit correspondence (accuracy). Over time, coherence tends to win out over correspondence. Conway (2009) refined the theory. He argued the working self consists of the individual’s goal system (goals; plans; projects) plus their conceptual self. It determines which autobiographical memories can be accessed. In addition, it is assumed that simple episodic memories resembling each other often form complex episodic memories. Finally, Conway et al. (2016) developed the notion of episodic memories within the self-memory system. They identified a remembering–­imagining

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 356

28/02/20 4:21 PM



Everyday memory

357 Figure 8.5 The knowledge structures within autobiographical memory, as proposed by Conway (2005). Reprinted from Conway (2005). Reprinted with permission of Elsevier.

system where episodic memories formed today are most accessible, with accessibility decreasing for episodic memories further in the past or future. This system “serves the purpose of integrating past, current, and future goal-related activities” (p. 256). Participants listed all the personal events they could remember from the past 5 days and events they imagined were likely to occur over the next 5 days. The findings were as predicted (see Figure 8.6).

Findings Research on patients with retrograde amnesia (widespread forgetting of events preceding brain injury; see Chapter 7) supports the notion there are different types of autobiographical knowledge. These patients often have greater difficulties recalling episodic memories than general events and lifetime periods (Conway & Pleydell-Pearce, 2000). For example, Rosenbaum et al. (2005) found an amnesic patient (KC) with no episodic memories could nevertheless access some general autobiographical knowledge.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 357

28/02/20 4:21 PM

358 Memory Figure 8.6 The mean number of events participants could remember from the past 5 days and those they imagined were likely over the next 5 days.

10 9

Now

Past

Future

8 7 6

From Conway et al. (2016).

5 4 3 2 1

ay Fr id

ay sd ur

Th

sd

ay

ay

ne

sd

W ed

Tu e

da y on

ay M

Fr id

ay sd ur

Th

ne

sd

ay

ay sd

W ed

Tu e

M

on

da y

0

How do amnesic patients (with their severely impaired episodic memory) cope when recalling autobiographical events? Lenton-Brym et  al. (2017) found amnesic patients were more likely than healthy controls to recall frequently occurring events. This probably happened because it is easier to use semantic memory processes to recall general rather than unique events. McCormick et  al. (2018) supported this viewpoint in a review. Amnesic patients use brain areas associated with retrieval of general or schematic information (e.g., the ventromedial prefrontal cortex; see Chapter 7) when retrieving autobiographical memories. According to the self-memory system model, the accessibility of autobiographical memories depends on individuals’ goals. Woike et  al. (1999) compared individuals with an agentic personality type (motivated by independence, achievement and personal power) and those with a communal personality type (motivated by interdependence and similarity to others). When they recalled a positive personal memory, 65% of agentic individuals recalled agentic memories (e.g., involving success) whereas 90% of communal individuals recalled communal memories (e.g., involving love or friendship). With negative personal memories, 47% of agentic individuals recalled agentic memories whereas 90% of communal individuals recalled communal memories. The model predicts faster recall of autobiographical memories with direct retrieval than generative retrieval. This prediction has support. Barzykowski and Staugaard (2016) both found direct retrieval was twice as fast as generative retrieval. According to the model, the individual’s working self and goals are more involved in generative than direct retrieval. Johannessen and Berntsen (2010) supported this assumption: memories elicited by generative retrieval were more significant and relevant to the individual’s personal identity than those involving direct retrieval. Addis et al. (2012) found generative retrieval was associated with more activation in prefrontal areas involved in strategic search for autobio­ graphical information. This finding is consistent with the plausible notion

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 358

28/02/20 4:21 PM



Everyday memory

that generative retrieval involves more top-down processing than direct retrieval. It has typically been assumed direct retrieval is involuntary whereas generative retrieval is voluntary. This assumption is oversimplified. Barzykowski and Staugaard (2016) distinguished between retrieval effort (high with generative retrieval and low with direct retrieval) and conscious intention (voluntary vs involuntary retrieval). They identified three types of autobiographical memories: (1) involuntary memories; (2) directly retrieved voluntary memories; and (3) generatively retrieved voluntary memories. According to the model, lifetime periods differ importantly from specific episodic memories. Various findings support this assumption (Thomsen, 2015). First, lifetime periods are regarded as more important than specific memories to an individual’s identity and personality. Second, memory for lifetime periods is less affected by ageing than specific memories. Third, lifetime period memories are generally less vivid and emotional than specific memories and are associated with less activation in frontal areas and the medial temporal lobes (Ford et al., 2011).

Evaluation The theoretical approach of Conway and Pleydell-Pearce (2000) and Conway (2009) provides a comprehensive account of autobiographical memory. Several of their main theoretical assumptions (e.g., the hierarchical structure of autobiographical memory; the intimate relationship between autobiographical memory and the self; the importance of goals in autobiographical memory) are well supported. There is also good support for the distinction between generative and direct retrieval. What are the limitations of the self-memory system model? First, we need to know more about how the working self interacts with the auto­biographical knowledge base to produce recall of specific autobiographical memories. Second, autobiographical memories vary in how much episodic information (e.g., contextual details) and semantic information (e.g., world knowledge) they contain. This issue is not addressed fully within the model. Third, the distinction between direct and generative retrieval is oversimplified. Fourth, the model does not fully account for the complexities of autobiographical memory revealed by cognitive neuroscience studies (discussed next).

Cognitive neuroscience The prefrontal cortex plays a major role in autobiographical memory retrieval (especially during generative retrieval). Svoboda et  al. (2006) found in a meta-analytic review that the medial and ventromedial prefrontal cortex was nearly always activated during autobiographical retrieval. Autobiographical memories are often of personally significant events and so are associated with emotion. The amygdala, buried deep within the temporal lobe, is strongly associated with emotion. As expected, amnesic patients who also have damage to the amygdala find it harder to retrieve emotional autobiographical memories (Buchanan et al., 2006). St. Jacques et  al. (2011) found four brain networks (with strong ­bidirectional connections between them) were activated when individuals

359

360 Memory Figure 8.7 A model of the bidirectional relationships between neural networks involved in the construction and/or elaboration of autobiographical memories. MTL = medial temporal lobe network; medial PFC = medial prefrontal cortex. From St. Jacques et al. (2011). Reprinted with permission of Elsevier.

produced autobiographical memories to emotionally arousing words by generative retrieval (see Figure 8.7): (1) Fronto-parietal network: it is involved in the construction of autobiographical memories, associated with adaptive controlled processes and is probably involved in verbal retrieval. (2) Cingulo-operculum network: it is also involved in the construction of autobiographical memories and with goal maintenance. (3) Medial prefrontal cortex network: it is involved in the construction and subsequent elaboration of autobiographical memories and self-­ referential processing. (4) Medial temporal lobe network: it is involved in the construction and subsequent elaboration of autobiographical memories and associated with declarative memory conscious recollection. Inman et  al. (2018) studied dynamic changes in brain activation during two stages of generative retrieval of autobiographical memories. First, processes involved in searching for and accessing autobiographical memories involved a ventral frontal to temporal-parietal network. Second, subsequent elaborative processing of these memories involved strong connections between occipital-parietal areas and dorsal fronto-parietal regions. There was no sudden switch between the two processing stages: rather, the relative dominance of access-related and elaboration-related processing altered over time.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 360

28/02/20 4:21 PM



Everyday memory

361

IN THE REAL WORLD: DEPRESSION AND AUTOBIOGRAPHICAL MEMORY It is assumed within the self-memory system model that information stored in (and retrieved from) autobiographical memory reflects the individual’s personality and sense of self. This assumption has been applied in studies on depressed individuals. Research has often involved participants recalling autobiographical memories of events lasting less than one day to word cues. Depressed individuals typically produce over-general negative memories (Fisk et al., 2019). For example, a depressed person might respond “Arguing with other people” to the cue “angry”. Most evidence shows only an association or correlation between over-general memories and depression and so does not demonstrate the former partially causes the latter. Stange et  al. (2013) reported more convincing evidence. The extent of over-general autobiographical memory predicted increases in depressive symptoms 8 months later in those exposed to high levels of familial emotional abuse. Dalgleish et al. (2011) asked patients with current major depressive disorder, patients in remission from that disorder and healthy controls to list their most important lifetime periods. After that, the patients decided which positive and negative items (words or phrases) applied to each period. Four measures were identified: (1) the proportion of items that was negative; (2) compartmentalisation (the extent to which the proportion of items that was negative varied across lifetime periods); (3) positive redundancy (the extent to which the same positive terms were used across periods); (4)  negative redundancy (the extent to which the same negative terms were used across periods).

Figure 8.8 Life structure scores (proportion negative, compartmentalisation, positive redundancy, negative redundancy) for patients with major depressive disorder, patients in remission from major depressive disorder and healthy controls. From Dalgleish et al. (2011). © 2010 American Psychological Association.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 361

28/02/20 4:21 PM

362 Memory

The proportion of selected terms that was negative was much greater for current depressed patients than controls (see Figure 8.8). In addition, current patients had a less integrated sense of self (i.e., greater compartmentalisation). This occurred in part because current depressed patients showed little consistency in their use of positive terms across lifetime periods (i.e., low positive redundancy). Finally, depressed patients in remission were intermediate between current patients and controls on most measures. What do these findings mean? First, the organisation of autobiographical knowledge in currently depressed patients is relevant to their working self (Conway & Pleydell-Pearce, 2000). More generally, current patients’ perceived self is revealed in the predominantly negative and nonintegrated structure of their autobiographical knowledge. Second, the structure of autobiographical knowledge is more integrated and less pervasively negative in patients in remission than current patients. Thus, recovery from major depressive disorder involves having a “healthier” perspective on one’s life history. Third, patients in remission nevertheless had a more negative and less integrated view of their life history than healthy controls. These findings suggest these patients were at risk of a subsequent depressive episode. Dalgleish and Werner-Seidler (2014) identified four cognitive biases in depression associated with autobiographical memory recall (see Figure 8.9). First, there is a strong tendency to recall negative autobiographical memories. Second, there is impoverished access to positive memories. Third, depressed individuals recall over-general negative memories. Fourth, depressed individuals have an altered relationship to their emotional memories in that they try (typically unsuccessfully) to avoid or suppress negative memories.

Biased recollection of negative memories

Impoverished positive memories

Depression

Over-general memory

Altered relationship to emotional memories

Figure 8.9 Four cognitive biases related to autobiographical memory recall that maintain depression and increase the risk of recurrence following remission. The Figure is Figure 1 in an article by Dalgleish and Werner-Seidler (2014) in Trends in Cognitive Sciences published by Cell Press.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 362

28/02/20 4:21 PM



Everyday memory

363

Interventions How can we use our knowledge of depressed individuals’ biases relating to autobiographical memory to reduce their level of depression? Some answers were discussed by Dalgleish and Werner-Seidler (2014). One approach is to use MEmory Specificity Training (MEST) where the emphasis is on training depressed patients to generate more specific autobiographical memories (e.g., for homework, patients produce specific memories to 10 cue words). MEST reduces rumination (repeated negative self-focused thoughts and images) and cognitive avoidance. Werner-Seidler et  al. (2018) found in patients with major depressive disorder that MEST increased the specificity of their autobiographical memories and reduced their depressive symptoms. Hitchcock et  al. (2016) used memory flexibility (MemFlex) training with individuals in remission from depression. This training focuses on the development of three important autobiographical memory skills: (1)  Balancing involves enabling depressed individuals to recollect positive and negative, specific and general memories, with equal ease. (2) Elaboration focuses on allowing depressed individuals to store richer and more elaborative positive memories by focusing on emotional and situational details of such memories. (3) Flexibility involves training individuals to control whether the memories they recall are general or specific. They also learn to identify situations where specific memories are optimal (e.g., solving a problem) and those where general memories are optimal (e.g., when considering the strength of a friendship). Hitchcock et al. (2016) found MemFlex training increased the specificity of recalled autobiographical memories, reduced rumination and improved social problem solving.

EYEWITNESS TESTIMONY Suppose you are the only eyewitness to a very serious crime. Subsequently the person you identify as the murderer on a line-up is found guilty although there is no other strong evidence. Is it safe to rely on eyewitness testimony? Simons and Chabris (2011) found 37% of Americans believe the testimony of a single confident eyewitness is sufficient to convict a criminal defendant. In fact, as we will see, eyewitness testimony can be very fallible.

IN THE REAL WORLD: IS EYEWITNESS CONFIDENCE TRUSTWORTHY? In the United States, over 200 individuals convicted on the basis of mistaken eyewitness identification have been proved innocent by DNA tests. Garrett (2011) reviewed 161 such cases and discovered virtually all the mistaken eyewitnesses were certain at trial they had identified the culprit. These findings suggest we should ignore the confidence (or otherwise) eyewitnesses express in their identifications. However, that conclusion is not warranted. One case Garrett (2011) examined was that of Ronald Cotton. In 1985, he was found guilty of raping Jennifer Thompson because of her confident eyewitness identification of him as the culprit. However, he was exonerated by DNA evidence, after having spent over 10 years in prison. Of crucial importance, when Jennifer Thompson initially identified Cotton from a photo line-up, she hesitated for almost 5 minutes before eventually saying “I think this is the guy”. This case

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 363

28/02/20 4:21 PM

364 Memory

is not unique. Garrett (2011, p. 49) found “In 57% of trials transcripts (92 out of 161 cases), the witnesses reported they had not been certain at the time of their earlier identifications”. Why does eyewitness confidence often increase substantially from initial identification to courtroom? With Jennifer Thompson, positive feedback from the police following her initial identification caused her to become increasingly confident she had identified the culprit. Douglass and Steblay (2006) showed the importance of such feedback in a metaanalytic review. Eyewitnesses receiving confirming feedback after an identification (e.g., “Good, you identified the suspect”) believed mistakenly they Jennifer Thompson and Ronald Cotton. Ronald had been very confident in the accuracy of their Cotton was mistakenly found guilty of raping Jennifer identification before receiving feedback: the “post- Thompson and spent many years in prison before identification feedback effect”. being exonerated. In sum, two conclusions are warranted. First, From Wixted and Wells (2017). Image provided courtesy of the we can generally trust eyewitnesses’ confidence PopTech Institute. in their identifications provided we focus on their  initial level of confidence. Wixted et  al. (2016) supported this conclusion in a large-scale real-life study of eyewitnesses’ initial identifications. When eyewitness confidence was low, only 20% of  identifications were of the suspect. This increased dramatically to approximately 80% when  their confidence was high. Second, “Testimony-relevant witness judgements should be collected and documented, preferably with videotape, before feedback can occur” (Steblay et al., 2014).

Eyewitness memory is inaccurate for several reasons. We start with

confirmation bias – eyewitnesses’ memory is influenced by their expec-

KEY TERM Confirmation bias A tendency for eyewitnesses’ memory to be distorted by their prior expectations.

tations. For example, Lindholm and Christianson (1998) found Swedish and immigrant students who saw a simulated robbery were twice as likely to select an innocent immigrant as an innocent Swede as the culprit. Participants’ expectations were influenced by the fact that immigrants are over-represented in Swedish crime statistics. Bartlett (1932) argued we have numerous schemas (packets of knowledge) in long-term memory strongly influencing what we remember (see Chapter 10). Most people’s bank-robbery schema includes information that robbers are typically male, wear disguises and have a getaway car with a driver (Tuckey & Brewer, 2003a). Tuckey and Brewer showed eyewitnesses a video of a simulated bank robbery. As predicted, eyewitnesses recalled information relevant to the bank-robbery schema better than irrelevant information (e.g., the colour of the getaway car). Schemas can also cause memory distortions because we reconstruct an event’s details based on “what must have been true”. In a study by Tuckey and Brewer (2003b), some eyewitnesses saw a robber’s head covered by a balaclava (ski mask) so their gender was ambiguous. Eyewitnesses mostly interpreted the ambiguous information as being consistent with their bank-robbery schema. Thus, their recall was systematically distorted by including information from their bank-robbery schema.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 364

28/02/20 4:21 PM



365

Everyday memory

Misinformation effect

KEY TERM

The most obvious reason why eyewitnesses’ memories are often inaccurate is that they fail to attend fully to the crime situation. After all, it typically occurs suddenly and unexpectedly. However, Loftus and Palmer (1974) emphasised a different reason – eyewitness memories are fragile and can easily be distorted by misleading information provided after the witnessed event: the misinformation effect.

Misinformation effect The distorting effect on eyewitness memory of misleading information presented after a crime or other event.

Findings Loftus and Palmer (1974) showed eyewitnesses a film of a car accident. Afterwards, some were asked “About how fast were the cars going when they smashed into each other”. For other eyewitnesses, the word “hit” replaced “smashed into”. The estimated speed averaged 41 mph when the verb “smashed” was used versus 34 mph when “hit” was used. Thus, information implicit in the question influenced memory for the accident. One week later, all eyewitnesses were asked whether they had seen any broken glass. There was no broken glass, but 32% of those previously asked about speed using the verb “smashed” said they had seen broken glass compared to only 14% of those asked using the verb “hit”. A misinformation effect involving more directly misleading information was reported by Loftus et  al. (1978). Eyewitnesses saw several slides, one showing a red Datsun car stopping at a stop or yield sign. Afterwards they were asked, “Did another car pass the red Datsun while it was stopped at the stop sign?” or the word “stop” was replaced by “yield”. In a third condition, the key question did not refer to a sign at all. Finally, the eyewitnesses decided which of two slides (car with a stop sign and car with a yield sign) they had seen previously. Eyewitness more often selected the wrong slide when the earlier question was misleading than when it was accurate or did not refer to the sign. Eyewitness memory can also be distorted by information presented before an event. Lindsay et  al. (2004) showed eyewitnesses a video of a museum burglary. Eyewitnesses who had listened to a thematically similar narrative (a palace burglary) the previous day made many more errors when recalling information from the video than those who had listened to a thematically dissimilar narrative (a school trip to a palace). This finding is important because eyewitnesses often have relevant past experiences that may distort their memory for a crime. The misinformation effect has generally been found for peripheral or minor details (e.g., presence of broken glass in the study by Loftus and Palmer, 1974) rather than central ones. In similar fashion, Putnam et  al. (2017) found the misinformation effect was much greater for relatively unmemorable than memorable details (see Figure 8.10). Putnam et al. (2017) pointed out that most textbook accounts assume the misinformation effect is nearly always found. However, they obtained contrary evidence. Misinformation led to enhanced recognition memory for an event when participants detected (and remembered) changes between that event and the post-event misinformation. What is happening here?

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 365

28/02/20 4:21 PM

366 Memory

False alarm rate: Misinformation condition

1.00

0.80

r = –0.55

0.60

0.40

0.20

0.00 0.00

0.20

0.40

0.60

0.80

1.00

Hit rate: Neutral condition

Figure 8.10 Size of the misinformation effect as a function of detail memorability in the neutral condition (i.e., absence of misleading information). From Putnam et al. (2017).

Misinformation sometimes acts as a cue that facilitates retrieval of details from the actual event.

Theoretical accounts How does misleading information distort what eyewitnesses report? Is the original memory permanently altered or does it still exist but is inaccessible? Loftus (1979) argued misinformation causes the previously formed memory of an event to be “overwritten” and destroyed. Loftus (1992) argued for a less extreme position – the original memory remains but eyewitnesses “accept” misinformation as forming part of the event memory. Edelson et  al. (2011) had eyewitnesses watch a crime scene in small groups and then recall the crime events three days later (Test 1). Four days later, they were misinformed their fellow eyewitnesses remembered several events differently from them. This was followed immediately by a memory test (Test 2) during which their brain activity was recorded. A week later, the eyewitnesses were told the answers allegedly given by their fellow eyewitnesses had been generated at random. Finally, they received another memory test (Test 3). Edelson et  al. (2011) decided whether eyewitnesses pretended to agree with the group on Test 2 or whether their memories had genuinely changed by seeing if they maintained their incorrect answers on Test 3. Brain activity during Test 2 indicated enhanced connectivity between the amygdala and hippocampus (both centrally involved in memory formation) was associated only with memories that had genuinely changed. Edelson et  al.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 366

28/02/20 4:21 PM



Everyday memory

367

(2011, p. 108) concluded that a long-lasting misinformation effect occurred only when there was a reconsolidation process (see Glossary) that “modified the neural representation of memory”. Oeberst and Blank (2012) argued misinformation does not cause permanent alteration of memory traces of a witnessed event. According to them, the misinformation effect occurs because eyewitnesses are instructed to recall the single correct account of an event. Oeberst and Blank told eyewitnesses they had received contradictory information and encouraged them to recall everything from the event and the misinformation. This manipulation completely eliminated the misinformation effect! Thus, the original memory traces were essentially intact. Blank and Launay (2014) carried out a meta-analysis of studies on the misinformation effect where eyewitnesses were warned of the presence of misinformation after viewing an event. Post-warning reduced the misinformation effect to between one-third and one-half of its size when no warning was provided (see Figure 8.11). Higham et  al. (2017) found the misinformation effect was eliminated when the post-warning was specific (i.e., it identified event details for which misinformation had been presented earlier) but not when it was general (i.e., indicating there had been misinformation). One reason event memories are inaccessible is source misattribution (Johnson et  al., 1993). In essence, a memory probe (e.g., question) activates memory traces overlapping with it in information. Source misattribution is most likely when the memories from one source resemble those from a second source (e.g., Lindsay et  al., 2004, discussed above, p. 366). Prull and Yockelson (2013) reported evidence suggesting the importance of source misattribution. The misinformation effect was much smaller when eyewitnesses received a source-recognition test encouraging them to retrieve source information.

7 6 Odds ratio

5 4 3 2 No misinformation effort

1 0 Post-warning

No warning

Original memory

Post-warning

No warning

Misinformation Endorsement

Figure 8.11 Extent of misinformation effects (expressed as an odds ratio) as a function of condition (post-warning vs no warning) for the original memory and endorsement of the misinformation presented previously. From Blank & Launay (2014). Reprinted with permission of Elsevier.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 367

28/02/20 4:21 PM

368 Memory

KEY TERM Weapon focus effect The finding that eyewitnesses pay so much attention to the presence of a weapon (e.g., gun) that they ignore other details and so cannot remember them subsequently.

In sum, the misinformation effect is often due to inaccessibility of information about the original event rather than altered memory traces. However, some evidence supports the latter explanation (Edelson et  al., 2011). Overall, the findings suggest the effects of misinformation on memory performance are not direct. Instead, memory performance is influenced flexibly by the precise strategies used by eyewitnesses to combine and integrate the information available to them. Other factors can also be involved (Wright & Loftus, 2008). One example is the vacant slot explanation (misinformation is more likely to be accepted when related information from the original event was not stored in memory). Another example is the blend explanation (misinformation and information from the original event are integrated in memory). Finally, the misinformation effect involves retroactive interference (see Glossary). Since retroactive interference with standard verbal memory tasks can be caused by several factors (see Chapter 6), it is unsurprising that the same is true of retroactive interference for criminal and other events.

Weapon focus, anxiety and violence How do anxiety and violence influence eyewitness memory? There is ­evidence for the weapon focus effect – eyewitnesses attend to the criminal’s weapon, which reduces their memory for other information. For example, Biggs et  al. (2013) found observers fixated weapons more than  neutral objects and so faces were fixated less often in the weapon condition. Harada et  al. (2015) found observers’ memory for peripheral stimuli was reduced in the presence of a weapon. This finding is consistent with Easterbrook’s (1959) hypothesis. According to this hypothesis, anxiety causes a narrowing of attention to central or important stimuli causing a reduction in individuals’ ability to remember peripheral details (see Chapter 15). Pickel (2009) pointed out that individuals often attend to stimuli that are unexpected in the current situation (inconsistent with their situational schema), which impairs their memory for other stimuli. She argued the weapon focus effect would be greater when the presence of a weapon was very unexpected. As predicted, the effect was especially strong when a criminal carrying a folding knife was female, because seeing a woman with a knife is unexpected. In similar fashion, the weapon focus effect was stronger when a male criminal carrying a handgun was white rather than black because of the mistaken stereotype linking black men with weapons. Fawcett et al. (2013) carried out a meta-analysis on the weapon focus effect. There was a moderate effect that was comparable in the laboratory and the real world. Peripheral details were often poorly remembered when the central object was unusual or unexpected in the current situation (even when the object was not a weapon). Fawcett et  al. (2016) discussed studies showing the presence of a weapon made it harder for eyewitnesses to discriminate the culprit from innocent individuals on a line-up. It also increased the probability of making false identifications. How do stress and anxiety influence eyewitness memory? Deffenbacher et al. (2004) carried out a meta-analysis. Culprits’ faces were identified 54%

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 368

28/02/20 4:21 PM



369

Everyday memory

of the time in low-stress conditions versus 42% in high-stress conditions and the findings were comparable for recall of details. Thus, stress and anxiety generally impair eyewitness memory. Morgan et al. (2013) considered the effects of very high stress. Military personnel endured a 3-minute stressful interrogation involving physical assault (e.g., slamming into a wall; facial slaps). Participants’ memory was generally poor and over 50% failed to identify their interrogator correctly. One reason stress impairs eyewitness memory is because it causes a narrowing of attention (see Easterbrook’s hypothesis discussed above). Yegiyan and Lang (2010) presented people with distressing pictures. As picture stressfulness increased, recognition memory for the central details improved progressively. In contrast, memory for peripheral details was much worse with highly stressful pictures than with moderately stressful ones. Thus, the findings supported Easterbrook’s hypothesis. Note, however, that “memory narrowing” is not always directly caused by “attentional narrowing” (see Chapter 15).

KEY TERMS Own-age bias The tendency for eyewitnesses to identify the culprit more often when they are of similar age to the eyewitness than when they are of a different age.

Ageing and memory Older eyewitnesses’ memory is less accurate than that of younger adults and they exhibit greater misinformation effects. Jacoby et  al. (2005) presented misleading information to younger and older adults. The older adults had a 43% chance of producing false memories at recall compared to only 4% for the younger adults. Older adults have impaired ability to use cognitive control effectively to focus retrieval on correct information (Keating et al., 2017) and are also less likely to monitor their own recall to reduce errors (Morcom, 2016). Wright and Stroud (2002) studied differences between younger and older adults identifying culprits after viewing crime videos. There was an own-age bias – both groups performed better when the culprit was of a similar age to themselves. Eyewitnesses may sometimes attend more closely to culprits of their own age. However, Neumann et al. (2015) found young adults did not attend more to young faces than older ones. Own-age bias might be due to expertise: most people have greater exposure to (and familiarity with) faces of individuals of their own age. Wiese et  al. (2013) reported supporting evidence. Young geriatric nurses had no own-age bias because, due to their experience with older people, they recognised old faces much better than did young controls.

Eyewitness identification: face recognition Eyewitness identification typically depends mainly on face recognition although other factors (e.g., an individual’s build and/or clothing) can be relevant. There is compelling evidence that most people find it surprisingly hard to recognise unfamiliar faces; this is of direct relevance to eyewitnesses’ memory for culprits’ faces. Poor recognition of unfamiliar faces occurs in part because different photographs of the same person display considerable variability and are often regarded incorrectly as coming from different individuals (see Figure 3.18) (Jenkins et al., 2011; Young & Burton, 2018).

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 369

28/02/20 4:21 PM

370 Memory

KEY TERMS Unconscious transference The tendency of eyewitnesses to misidentify a familiar (but innocent) face as being the person responsible for a crime. Other-race effect The finding that recognition memory for same-race faces is generally more accurate than for other-race faces.

The police often ask eyewitnesses to identify the person responsible for a crime from several individuals physically present in a line-up or shown in photographs. Valentine et  al. (2003) found eyewitness identification is very fallible. Of 640 eyewitnesses trying to identify suspects in 314 real lineups, only 40% identified the suspect, 20% identified a non-suspect and 40% failed to make an identification. Eyewitnesses who are very confident about face identification tend to be more accurate than those less confident (Brewer & Wells, 2011). For example, Odinot et al. (2009) studied the memory of 14 eyewitnesses of an actual supermarket robbery in the Netherlands. There was a moderate correlation (+.38) between eyewitness confidence and accuracy. Wixted et  al. (2016; discussed earlier, see p. 364) also found that eyewitness confidence predicted accuracy of culprit identification in a real-life study. Eyewitnesses sometimes remember a face but fail to remember the precise circumstances in which they saw it. Ross et  al. (1994) had eyewitnesses observe an event where a bystander and the culprit were present. They were three times more likely to select the bystander from a line-up than someone else not seen before when the culprit was not present. This is unconscious transference – a face is correctly recognised as having been seen before but incorrectly judged to be responsible for a crime. In similar fashion, eyewitnesses are more likely to identify a suspect on a line-up if they have previously been seen in a line-up (Steblay & Dysart, 2016). Another relevant finding is the other-race effect – same-race faces are identified better than other-race faces (Young et al., 2012). Unsurprisingly, eyewitnesses having the most experience with members of another race have a relatively small other-race effect (Hugenberg et al., 2010). Contrary to common belief, the other-race effect does not depend entirely on problems with remembering other-race faces. Megreya et  al. (2013) found perceptual processes are also involved (see Figure 8.12). British and Egyptian participants viewed a target face and an array of ten faces. They decided whether the target face was in the array; if so, they identified it. There were minimal memory demands on memory as all the photographs were visible. Megreya et  al. obtained the other-race effect:  (1)  the target was correctly identified more often with same-race faces than other-race ones (70% vs 64%, respectively); and (2) when the target face was absent, mistaken identification of a non-target face was more frequent with other-race than same-race faces (47% vs 34%, respectively). Brown et  al. (2017) replicated the other-race effect. There was greater activation of fronto-parietal networks (involved in top-down attention and cognitive control) during encoding of same-race than other-race faces. These findings suggest that problems with remembering other-race effects are due to reduced attention to (and processing of) such faces. In a study by Jenkins et  al. (2011), observers showed very poor face recognition because photographs of the same face often show considerable variability (see Chapter 3). As a consequence, it is generally hard for eyewitnesses to make a correct identification from a single photograph as they are typically requested to do. It follows that eyewitnesses’ ability to identify unfamiliar faces might be enhanced if presented with multiple photographs of the same person.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 370

28/02/20 4:21 PM



Everyday memory

371

Figure 8.12 Examples of Egyptian (left) and UK (right) face-matching arrays. The task was to decide whether the person shown at the top was present in the array underneath. From Megreya et al. (2013). © Taylor & Francis.

Jones et  al. (2017) tested the above implication. Participants viewed a single front-view photograph of an individual (the target), seven photographs of that individual at different orientations or seven computer-­ generated synthesised images of that individual at different orientations (see Figure 8.13). Subsequently, participants selected the target face from

Figure 8.13 Panel (a): seven photographs of the same individual taken from different angles; Panel (b): seven synthesised images of the same individual at different orientations. From Jones et al. (2017).

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 371

28/02/20 4:21 PM

372 Memory

an array of five faces. As predicted, face-recognition performance was worst following presentation of a single photograph and best following presentation of synthesised images. This is important because police can generate such synthesised images from a single photograph.

From laboratory to courtroom Can we apply findings from laboratory studies to real-life crimes? There are several differences. First, eyewitnesses are much more likely to be the victims in real life than the laboratory. Second, it is much less stressful to watch a video of a violent crime than to experience it. Third, in laboratory research the consequences of an eyewitness making a mistake are trivial but can literally be a matter of life or death in an American trial. There are also important similarities. Ihlebaek et  al. (2003) used a staged robbery involving two robbers with handguns. In the live condition, eyewitnesses were ordered repeatedly to “Stay down!”. A video taken during the live condition was presented to eyewitnesses in the video condition. Eyewitnesses in both conditions exaggerated event duration and showed similar patterns in what they remembered. However, those in the video condition recalled more information. In another study (Pozzulo et al., 2008), eyewitnesses observed a staged theft live or via video. Eyewitnesses in the live condition reported more stress and arousal but correct identification of the culprit was comparable in the two conditions. Tollestrup et  al. (1994) analysed police records concerning the identifications by eyewitnesses to crimes involving fraud and robbery. Factors important in laboratory studies (e.g., weapon focus; retention interval) were also important in real-life crimes. In sum, artificial laboratory conditions typically distort the findings only modestly. If anything, the errors in eyewitness memory under laboratory conditions underestimate memory deficiencies for real-life events. This is due in part to the generally greater stressfulness of witnessing real-life crimes. Overall, laboratory research is definitely relevant to the legal system.

ENHANCING EYEWITNESS MEMORY The police obviously have no control over the circumstances at the time of the crime (e.g., lighting; event duration). Such uncontrollable factors are known as estimator variables (Albright, 2017). There are also factors (known as system variables) that can be controlled by the criminal justice system; they include how line-ups are presented to eyewitnesses and interview techniques used to question eyewitnesses. These system variables are discussed below.

Line-ups Line-ups can be divided into those involving double-blind and those involving single-blind administration. With double-blind administration, the line-up is conducted by administrators who do not know which line-up member is the suspect, whereas they do have such knowledge with single-­ blind administration. Double-blind administration is preferable because

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 372

28/02/20 4:21 PM



Everyday memory

373

single-blind administration can cause systematic bias in the identification made by the eyewitness (Kovera et al., 2017). Line-ups can be simultaneous (the eyewitness sees everyone at the same time) or sequential (the eyewitness sees only one person at a time). Which approach is more effective? Steblay et al. (2011) conducted a meta-analysis. When the culprit was present, they were selected 52% of the time with simultaneous line-ups compared to 44% with sequential ones. When the culprit was absent, eyewitnesses mistakenly selected someone with simultaneous line-ups more often than with sequential ones (54% vs 32%, respectively). Thus, eyewitnesses adopt a more stringent criterion with sequential line-ups. Misidentifications with sequential line-ups in the laboratory can be reduced by providing a “not sure” option. This reduced misidentifications from 22% to only 12% (Steblay & Phillips, 2011). Warning eyewitnesses the culprit may not be in the line-up also reduces misidentification errors (Steblay, 1997). Wells et  al. (2015) carried out a large-scale study differing from most studies reviewed by Steblay et al. (2011) in two main ways. First, it involved eyewitnesses to actual crimes rather than videoed or staged laboratory crimes. Second, the eyewitnesses were permitted to say they were “not sure” (as happens in most real-life crime cases). In contrast, the great majority of laboratory studies require eyewitnesses to make definite “yes” or “no” decisions. What did Wells et al. (2015) find? First, the suspect was identified 25% of the time with both simultaneous and sequential line-ups. Second, an innocent person was identified incorrectly more often with simultaneous than sequential line-ups (18% vs 11%). Third, eyewitnesses used the “not sure” response more often in the sequential line-up: eyewitnesses were unsure whether a subsequently viewed person might resemble the culprit more than the current one. Wixted et  al. (2016) also studied eyewitnesses to real-life crimes and obtained confidence ratings from these eyewitnesses when exposed to sequential or simultaneous line-ups. Eyewitnesses identified 91% of suspects having independent evidence of guilt against them with simultaneous line-ups compared to 76% with sequential line-ups. When account was taken of eyewitnesses’ confidence ratings, their overall performance was slightly better with simultaneous line-ups. In sum, eyewitnesses are more likely to identify the culprit with simultaneous line-ups. However, innocent individuals are also more likely to be selected on simultaneous line-ups. Which type of line-up is preferable depends on the precise magnitude of these two effects.

Cognitive interview Psychologists have contributed substantially to the goal of maximising the information provided by eyewitnesses being interviewed by developing the cognitive interview. This is based on four retrieval rules (Geiselman & Fisher, 1997): (1) mental reinstatement of the environment and any personal contact experience during the crime (context reinstatement); (2) encouraging the reporting of every detail including minor ones;

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 373

28/02/20 4:21 PM

374 Memory

(3) describing the incident in several different orders (e.g., backwards in time); (4) reporting the incident from different viewpoints, including those of other eyewitnesses; Anderson and Pichert (1978) found this strategy useful (see Chapter 10). These retrieval rules are based on our knowledge of human memory. The first two rules derive from the encoding specificity principle (Tulving, 1979; see Chapter 7). According to this principle, recall depends on the overlap or match between the context in which an event is witnessed and that at recall. The third and fourth rules are based on the assumption that memory traces contain several kinds of information. As a result, crime information can be retrieved using different retrieval routes. There have been two major changes in the cognitive interview (Memon et al., 2010a). First, researchers developed an enhanced cognitive interview. This differed from the basic cognitive interview by emphasising the importance of creating rapport between interviewer and eyewitness. Roy (1991, p. 399) indicated how this can be achieved: Investigators should minimise distractions, induce the eyewitness to speak slowly, allow a pause between the response and next question, tailor language to suit the individual eyewitness, follow up with interpretive comment, try to reduce eyewitness anxiety and avoid ­judgemental and personal comments. Second, the police typically shorten the cognitive interview emphasising the first two retrieval rules discussed earlier. This is done in part because the entire cognitive interview can be very time-consuming.

Findings Memon et  al. (2010a) carried out a meta-analysis comparing the cognitive interview with the standard police interview. Many more details were correctly recalled by eyewitnesses with the cognitive interview (basic or enhanced). However, its beneficial effects were reduced in highly arousing situations or with a long retention interval between the incident and interview. Nevertheless, the cognitive interview remained effective even with high arousal and a long retention interval. The main limitation of the cognitive interview was that there was a fairly small increase in recall of correct details compared to the standard interview. In addition, the cognitive interview does not reduce the adverse effects of misleading information presented beforehand (Menon et al., 2009b). Are all four components of the cognitive interview equally important? No. Colomb and Ginet (2012) found mental or context reinstatement of the situation and reporting all the details both enhanced recall. However, altering the eyewitness’s perspective and changing the order of recall were ineffective. Dando et al. (2011) found requiring eyewitnesses to recall information in a backward temporal order reduced correct recall and increased memory errors. These negative effects occurred because backward recall disrupted the temporal organisation of eyewitness memory for the crime.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 374

28/02/20 4:21 PM



375

Everyday memory

How can we increase eyewitness accuracy using the cognitive interview? Paulo et al. (2016) found eyewitnesses’ error rate was 6% when they seemed certain of what they were recalling but 23% when they seemed uncertain. Thus, accuracy can be improved by taking account of eyewitnesses’ confidence. Paulo et  al. (2017) adapted the cognitive interview to include category clustering recall – eyewitnesses organised their recall in categories (e.g., person details; location details; action details). This produced enhanced recall compared to the standard cognitive interview and reduced errors. Category clustering recall was effective because it provided eyewitnesses with cues providing an organised structure to facilitate retrieval of event information.

KEY TERMS Retrospective memory Memory for events, people and so on experienced in the past; see prospective memory. Prospective memory Remembering to carry out some intended action in the absence of an explicit reminder to do so; see retrospective memory.

Evaluation The cognitive interview has a well-established theoretical and empirical basis. It is an effective method for obtaining as much accurate information as possible from eyewitnesses under most circumstances. There is increasing evidence concerning the relative effectiveness of the four main components of the cognitive interview. Potentially important refinements of the cognitive interview (e.g., category clustering recall; taking account of ­eyewitnesses’ confidence) have been proposed. What are the main limitations with the cognitive interview? First, the small increase in incorrect eyewitness recall can lead detectives to misinterpret the evidence. Second, it does not reduce the negative effects of misinformation. Third, mental or context reinstatement can have a negative effect on recognition memory by increasing the perceived familiarity of non-target faces (Wong & Read, 2011). Fourth, the cognitive interview is less effective when the witnessed event was stressful and there is a long delay between the event and the interview.

Case study: Cognitive interview and eyewitness confidence

PROSPECTIVE MEMORY The great majority of memory studies have focused on retrospective memory, in which the emphasis is on the past (especially individuals’ ability to remember previous events or knowledge stored in long-term memory). In contrast, prospective memory is “the cognitive function we use for formulating plans and promises, for retaining them, and for recollecting them subsequently either at the right time or on the occurrence of appropriate cues” (Graf, 2012, pp. 7–8). Examples include remembering to go to meet a friend at a coffee shop or to attend a revision session for a course in psychology. Failures of prospective memory are responsible for at least 50% of everyday memory problems. Tragic events occurring as a result of failures of prospective memory also indicate its importance. Einstein and McDaniel (2005, p. 286) discussed an example: After a change in his usual routine, an adoring father forgot to turn toward the day-care centre and instead drove his usual route to work at the university. Several hours later, his infant son, who had been quietly asleep in the back seat, was dead.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 375

28/02/20 4:21 PM

376 Memory

KEY TERMS Time-based prospective memory A form of prospective memory which involves remembering to carry out an intended action at the appropriate time. Event-based prospective memory A form of prospective memory that involves remembering to perform an intended action (e.g., buying groceries) when the circumstances are appropriate.

Prospective memory vs retrospective memory How different are retrospective and prospective memory? Failures of the two types of memory are interpreted differently (Graf, 2012). Failures of prospective memory involving promises to another person are often regarded as indicating poor motivation and reliability. In contrast, failures of retrospective memory are attributed to poor memory. Thus, deficient prospective memory means a “flaky person” whereas deficient retrospective memory a means “faulty brain” (Graf, 2012). There are other differences: (1) Retrospective memory generally involves remembering what we know about something and can be high in informational content (Baddeley et  al., 2015). In contrast, prospective memory typically focuses on when to do something and has low informational content. (2) Prospective memory is more relevant to our everyday plans and goals. (3) More external cues (e.g., “What did you have for breakfast yesterday?”) are typically available with retrospective than prospective memory. Anderson and McDaniel (2019) found in a naturalistic study that only 39% of individuals’ prospective-memory thoughts were triggered by external cues. (4) Prospective memory is the form of memory “in which the problem is not memory itself, but the uses to which memory is put” (Moscovitch, 2008, p. 309). Remembering and forgetting often involve both prospective and retrospective memory. For example, achieving the task of buying goods from the local supermarket requires memory of the intention to go there (prospective memory) and memory of what you had decided to buy (retrospective memory). Crawford et al. (2003) identified separate prospective and retrospective memory factors from a questionnaire designed to assess prospective and retrospective memory. There was also a general memory factor based on elements of prospective and retrospective memory. In sum, there are several similarities and differences between prospective and retrospective memory. McBride and Workman (2017) provide a thorough review.

Event-based vs time-based prospective memory There is an important distinction between time-based and event-based prospective memory. Time-based prospective memory involves performing a given action at a particular time (e.g., phone a friend at 8 pm). Eventbased prospective memory involves performing an action in the appropriate circumstances (e.g., passing on a message when you see a given person). Unsurprisingly, performance on event-based tasks depends in part on the accuracy (or inaccuracy) of any given individual’s time estimation (Waldum & McDaniel, 2016). There has been much more research on event-based prospective memory. With event-based tasks, researchers can manipulate the precise

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 376

28/02/20 4:21 PM



377

Everyday memory

nature and timing of cues indicating the intended action should be performed. That provides more control over retrieval conditions than is generally possible with time-based tasks. In the real world, the requirement to use prospective memory typically occurs while individuals are busily involved in performing some unrelated task. Most laboratory research is similar because participants are generally engaged in an unrelated ongoing task while performing a prospective-memory task. Event-based tasks tend to be easier than time-based tasks. For example, Kim and Mayhorn (2008) found event-based prospective memory was superior under both laboratory and naturalistic conditions because the intended actions are more likely to be triggered by external cues on event-based tasks. Hicks et  al. (2005) confirmed event-based tasks are generally less demanding than time-based ones. However, both kinds of tasks were more demanding when the task was ill-specified (e.g., detect animal words) rather than well-specified (e.g., detect the words nice and hit). A well-­specified timebased task was no more demanding than an ill-specified event-based task. The strategies used on time-based and event-based tasks often differ considerably. The occurrence of prospective-memory cues is typically more predictable on time-based tasks. As a result, individuals generally engage in only sporadic monitoring of prospective-memory cues on time-based tasks with this monitoring increasing as the occurrence of the cue approaches (Cona et al., 2015). In contrast, as we will see, there is much more evidence of continuous monitoring on event-based tasks because of unpredictability concerning the cue’s occurrence. Cona et al. (2015) showed the importance of predictability with event-based tasks: the pattern of monitoring resembled that typically found with time-based tasks when the occurrence of prospective-memory cues was predictable.

KEY TERM ongoing task A task performed at the same time as a prospective-memory task in studies on prospective memory.

Stages in prospective memory Prospective memory typically involves several separate processes or stages. As a consequence, there are various ways prospective memory can fail. Zogg et  al. (2012) provided a sketch map of the main processes/stages involved (see Figure 8.14): (1) Intention formation: the individual forms or encodes an intention linked to a specific cue (e.g., “I will discuss the weekend with my friend when I see him”). (2) Retention interval: there is a delay (minutes, hours or weeks) between intention formation and intention execution. As we have seen, there is typically frequent environmental monitoring for event cues on eventbased tasks but sporadic monitoring for time cues on time-based tasks. (3) Cue detection and intention retrieval: the individual detects and recognises the relevant cue (e.g., sighting your friend; seeing it is 4 o’clock); this is followed by self-initiated retrieval of the appropriate intention. (4) Intention recall: the individual retrieves the intention from retrospective memory; there may be problems because of the intention’s complexity, its relationship to other stored intentions or the presence of competing intentions.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 377

28/02/20 4:21 PM

378 Memory Figure 8.14 A model of the component processes involved in prospective memory. Intention formation is followed by monitoring for event and/or time cues. Successful monitoring leads to cue detection and intention retrieval, intention recall and intention execution. From Zogg et al. (2012). Reprinted with permission of Springer Science+Business Media.

(5) Intention execution: this is typically fairly “automatic” and unde­manding.

Prospective memory in real life In this section, we discuss prospective memory in various groups of people. In the Box, we consider individuals (e.g., pilots; air traffic controllers) for whom forgetting of intended actions can prove fatal. We also discuss people often regarded as having poor prospective memory.

IN THE REAL WORLD: PLANE CRASHES – PILOTS AND AIR TRAFFIC CONTROLLERS Dismukes and Nowinski (2006) studied pilot errors involving memory failures. There were failures of prospective memory in 74 out of 75 incidents or accidents! There was an almost total absence of retrospective memory failures because pilots have excellent knowledge and memory of the operations needed to fly a plane. Here is an example of a plane crash caused by failure of prospective memory. On 31 August 1988, a Boeing 727 (Flight 1141) was in a long queue awaiting departure from Dallas-Fort Worth airport. The air traffic controller unexpectedly told the crew to move up past the other planes to the runway. This caused the crew to forget to set the wing flaps and leading edge slat to 15  degrees (a failure of prospective memory). As a result, the plane crashed beyond the end of the runway leading to several deaths. What causes pilots to exhibit prospective-memory failures? Relevant evidence was reported by Latorella (1998). Commercial pilots interrupted while flying a simulator made 53% more errors than those not interrupted. Such interruptions are relatively common. Gontar et  al. (2017) discovered pilots on average experienced eight interruptions (e.g., from colleagues; from outside the cockpit) during preparations for each flight. Unsurprisingly, the adverse effects of interruption on task performance are greater with longer interruptions (Altmann et al., 2017).

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 378

28/02/20 4:21 PM



379

Everyday memory

Interruptions increase performance errors because they impair prospective memory for intentions that could not be performed at the typical point in a sequence of actions. More specifically, we can explain the effects of interruptions with Shelton and Scullin’s (2017) dynamic multiprocess framework (discussed later, pp. 382–386). According to this framework, we can remember to perform an intended action because of bottom-up processes (e.g., encountering a relevant cue). When pilots are not interrupted, each item in an action sequence cues the next action. Such cueing is lacking if actions are performed out of sequence. According to Shelton and Scullin (2017), we can also remember to perform an intended action because of top-down processes (i.e., monitoring for cues and rehearsing the intention). It is effortful using these processes when one is interrupted during task performance (Altmann et al., 2017). As a result, pilots can find it difficult to continue with a sequence of actions while monitoring and rehearsing. How can we reduce pilot errors following interruptions? Engaging in effortful top-down processes is one answer. Alternatively, retrieval cues such as the humble egg timer could be used to remind  pilots to resume an interrupted task (Gontar et  al., 2017). For example, pilots might only activate an egg timer when some task has been interrupted and will need to be attended to shortly. Errors made by air traffic controllers often involve prospective memory (failures to perform intended actions while monitoring a display). Loft and Remington (2010) found prospectivememory errors by participants in a simulated air traffic control task were more common when interruptions led them to deviate from well-practised or strong routines rather than less-practised ones. This is important because air traffic controllers (and pilots) devote much of their time to habitual tasks involving strong routines. Such tasks are carried out fairly “automatically” due to habit capture which can cause prospective-memory failures when something unexpected happens (Dismukes, 2012).

5000 15%

4000 Response time

Resumption failure proportion

20%

10%

5%

0%

3000 2000 1000

2.37%

3.39%

10.85%

No-interruption

Blank

ATC

0

2369

4501

4951

No-interruption

Blank

ATC

Figure 8.15 Mean failures to resume an interrupted task (left side) and mean resumption times in msec (right side) for the conditions: no-interruption, blank-screen interruption and secondary air traffic control task interruption. From Wilson et al. (2018).

Wilson et  al. (2018) explored the effects of interruptions on a simulated air traffic control task. There were three conditions: (1) interruption involved a blank screen; (2) interruption involved a secondary air traffic control (ATC) task resembling the main one; and (3) a no-interruption control

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 379

28/02/20 4:21 PM

380 Memory

condition. Both interruption conditions increased the time taken to resume the main air traffic control task (see Figure 8.15) because participants took some time to re-activate information relevant to the main ATC task. In addition, there were more failures to resume the interrupted task following a secondary ATC task than in the other two conditions because the demands of the secondary task caused increased forgetting of the interrupted task. Loft et  al. (2013, 2016) found prospective-memory errors were reduced when flashing visual aids accompanied the appearance of target planes. However, participants experiencing severe ­scheduling demands sometimes failed to take advantage of these visual aids.

Obsessive-compulsive disorder and checking behaviour

KEY TERMS Obsessive-compulsive disorder (OCD) An anxiety disorder in which the symptoms include unwanted thoughts (obsessions) and repetitive behaviours (compulsions) in response to those thoughts. Meta-memory Beliefs and knowledge about one’s own memory including strategies for learning and memory.

Most patients with obsessive-compulsive disorder (OCD) have checking compulsions. They check repeatedly they have locked their front door, the gas has been turned off and so on but remain uncertain whether they have actually performed their intended actions. How can we explain this checking behaviour? Perhaps obsessional individuals have poor retrospective memory ability causing them to forget whether they have recently engaged in checking behaviour. However, compulsive checkers are generally comparable to healthy controls in retrospective memory (Cuttler & Graf, 2009a). Perhaps checkers have poor prospective memory. Cuttler and Graf (2009b) found checkers had impaired performance on event-based and timebased prospective-memory tasks. Similarly, Yang et al. (2015) found patients with OCD had impaired performance on time-based tasks and were slower than controls on event-based tasks. Yang et al. reported evidence the poor prospective memory of OCD patients involved impaired mental flexibility. It is possible poor prospective memory leads obsessionals to engage in excessive checking. However, excessive checking may lead to poor prospective memory. Suppose you check several times every day you have locked your front door. You would remember you had checked it numerous times. However, you might well be unsure whether you have checked your front door today because of all the competing memories. Van den Hout and Kindt (2004) asked some participants to engage in repeated checking of a gas stove. On the final trial, those checking repeatedly had less vivid and detailed memories of what had happened. Linkovski et  al. (2013) carried out a similar study. They also assessed participants’ level of inhibitory control because obsessional patients have deficient inhibitory control, which may lead to intrusive thoughts and memory problems. What did Linkovski et  al. (2013) find? Repeated checking did not impair prospective-memory performance. However, it reduced memory vividness and detail and also lowered participants’ confidence in their memory. These effects were all much stronger in participants with poor inhibitory control (see Figure 8.16). Toffolo et  al. (2016) emphasised the distinction between memory performance (i.e., memory accuracy) and meta-memory (knowledge and beliefs about one’s own memory). Meta-memory encompasses measures of memory confidence, memory vividness and memory detail. Toffolo et  al. identified three main research findings:

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 380

28/02/20 4:21 PM



Everyday memory

381

Figure 8.16 Self-reported memory vividness, memory details and confidence in memory for individuals with good and poor inhibitory control before (pre-) and after (post-) repeated checking. From Linkovski et al. (2013). Reprinted with permission of Elsevier.

(1) Patients with OCD engage in more checking behaviour than those lacking obsessional tendencies. (2) Repeated checking typically produces large reductions in meta-­ memory ratings which are comparable in OCD patients and controls (e.g., Radomsky et al., 2014). (3) Even though repeated checking reduces meta-memory ratings substantially, it typically has no effect on memory accuracy (e.g., Radomsky et al., 2014). In sum, patients with OCD often exhibit impaired prospective memory. The following conclusions seem warranted: Even though it is still unknown what comes first (the tendency to use more checking behaviour in general or OCD), . . . when people who are vulnerable for OCD use more checking, this may [reduce] memory confidence. This may subsequently lead to the vicious cycle of increased checking behaviour and memory distrust, eventually contributing to the development of new OC [obsessional compulsive] symptoms. (Toffolo et al., 2016, p. 60) Purdon (2018) discusses further evidence for the notion that checking behaviour impairs memory confidence. She also argues that patients with OCD have a need for certainty that contributes to their excessive checking behaviour.

THEORETICAL PERSPECTIVES ON PROSPECTIVE MEMORY Our main emphasis here is on event-based prospective memory. What typically happens is that two tasks are performed during the same period of time. One task is the ongoing task and the other is the prospective-memory task. As we will see, performance on the prospective-memory task depends on its relationship to the ongoing task.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 381

28/02/20 4:21 PM

382 Memory

KEY TERMS Focal task An ongoing task that involves similar processing to that involved in encoding the target on a prospective-memory task performed at the same time; see non-focal task. Non-focal task An ongoing task that involves different processes to those involved in encoding the target on a prospectivememory task performed at the same time; see focal task.

The multiprocess framework (Einstein and McDaniel, 2005) has been very influential. According to this framework, various cognitive processes (including attentional ones) are used during prospective-memory tasks. However, the detection of cues for response is typically “automatic” (i.e., not requiring attentional processes) when the following criteria (especially the first) are fulfilled: (1) The ongoing task is a focal task – one that “encourages processing of the target [on the prospective-memory task] and especially those features that were processed at encoding [of the prospective-­memory target]” (McDaniel et al., 2015, p. 2). Here is an example: the ongoing task requires participants to decide whether each letter string is a word and the prospective-memory task involves responding to the word “sleep”. (2) The cue on the prospective-memory task and the to-be-performed action are highly associated. (3) The cue is conspicuous or salient. (4) The intended action is simple. McDaniel et  al. (2015) developed the multiprocess framework into the dual-pathways model (see Figure 8.17) based on the distinction between focal and non-focal ongoing tasks. A non-focal task “does not encourage processing of those features . . . processed at encoding [of the ­ prospective-memory target]” (p. 2). For example, the ongoing task requires participants to decide whether each letter string is a word and the prospective-­memory task involves responding to words starting with the letter r. Thus, there is much less overlap between the processing required on the p ­ rospective-memory and ongoing tasks when the latter is non-focal. It is assumed the processes typically used with focal and non-focal tasks differ substantially (see Figure 8.17). Strategic monitoring involves top-down attentional control processes to maintain the prospective-­ memory intention and to search for relevant cues on that task. It is used much more often with non-focal than with focal tasks. According to the dual-pathways model, retrieval on the ­prospectivememory task can occur in two ways: (1) spontaneous retrieval involves ­bottom-up processes triggered by the relevant stimulus and does not require prior monitoring; (2) intentional retrieval is based more on top-down processes and requires prior monitoring. Non-focal tasks involve intentional retrieval. In contrast, focal tasks generally involve spontaneous retrieval but can also involve intentional retrieval. Finally, the main brain areas associated with the cognitive processes involved in prospective memory are identified. Shelton and Scullin (2017) presented a dynamic multiprocess framework largely consistent with the dual-pathways model. Two cognitive processes underlie successful prospective-memory performance: (1) Monitoring: this involves top-down attentional control to search for cues indicating the prospective-memory action should be performed. (2) Spontaneous retrieval: this involves bottom-up processing triggered by processing a cue.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 382

28/02/20 4:21 PM



Everyday memory

Non-focal

Figure 8.17 The dual-pathways model of prospective memory (based on the multiprocess framework) for non-focal and focal tasks separately. The solid black arrows indicate the sequence of processing over time. The dashed-line arrows indicate that strategic monitoring processes are sometimes involved even with focal tasks.

Focal

Maintenance Sustained activation

Encoding

Strategic monitoring:

Retrieval Transient activation

DLPFC, VLPFC, insula, anterior cingulate, FEF, lateral BA 10, BA 47, precuneus

Intentional retrieval:

BA 40, insula, lateral BA 10, anterior cingulate

Spontaneous retrieval:

Committee meeting Remember to pick up wine after work.

Office Are you going to the holiday party?

Driving home Advertisement

Store sign

Top-down processes No monitoring

Example 2: Spontaneous retrieval only view Context:

Committee meeting Remember to pick up wine after work.

Office Are you going to the holiday party?

From McDaniel et al. (2015).

Ventral frontoparietal network, BA 9, MTL, especially hippocampus

Example 1: Monitoring only view Context:

383

Driving home Advertisement

Figure 8.18 Example 1: top-down monitoring processes operating in isolation; Example 2: bottom-up spontaneous retrieval processes operating in isolation; Example 3: dual processes operating dynamically. From Shelton and Scullin (2017).

Store sign

Top-down processes No monitoring

Spontaneous retrieval

Spontaneous retrieval

Spontaneous retrieval

Example 3: Dynamic multiprocess view Context:

Committee meeting Remember to pick up wine after work.

Office Are you going to the holiday party?

Driving home Advertisement

Store sign

Top-down processes No monitoring

Spontaneous retrieval

Spontaneous retrieval

What determines which process is used? First, monitoring is effortful and often impairs ongoing task performance because it creates competition for processing capacity. As a consequence, monitoring is rarely used when the ongoing task is important (e.g., a committee meeting). Second, monitoring is used primarily when prospective-memory cues are expected (e.g., when close to a wine shop as shown in Figure 8.18).

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 383

28/02/20 4:21 PM

384 Memory

KEY TERMS Meta-cognition Beliefs and knowledge concerning one’s own cognitive processes and likely level of performance.

Third, Shelton and Scullin (2017) assumed top-down and bottom-up processes interact dynamically on prospective-memory tasks. For example, monitoring depends importantly on meta-cognition (knowledge and beliefs about one’s own cognitive processes and performance). Suppose you perform a prospective-memory task. If you are confident the task will be easy (e.g., there will be strong retrieval cues), you might choose to rely on spontaneous retrieval rather than monitoring. However, if you expect the task to be difficult (e.g., retrieval cues will be weak or non-existent), you would probably choose to use extensive monitoring. In sum, the dynamic multiprocess framework differs from previous theories in that the processing strategies used on prospective-memory tasks are flexibly influenced by meta-cognitive processes. The multiprocess theory is less flexible – it assumes processing on prospective-memory tasks is predominantly determined by the task (focal vs non-focal).

Findings The requirement to perform a prospective-memory task at the same time as an ongoing task generally leads to impaired performance on the ongoing task. According to the above theories, this occurs when the ongoing and prospective-memory tasks compete for processing resources. Such competition is especially great when demanding top-down processes (e.g., monitoring) are used on the prospective-memory task. Support for the above theoretical assumptions was reported by Moyes et al. (2019). They found impaired performance on the ongoing task when it was non-focal and so demanding processing (especially monitoring) was required on the prospective-memory task. In contrast, performance was not impaired on the ongoing task when it was focal and so demanding monitoring processes on the prospective-memory task were not required. In spite of findings such as those of Moyes et al. (2019), the above theoretical assumptions are oversimplified. Rummel et  al. (2017) argued the requirement to perform two tasks at the same time reduces mind-­wandering (task-unrelated thoughts) as participants try to cope with the overall processing demands. They obtained clear support for this argument. Rummel et  al. (2017) also found performance on a prospective-­ memory task was much better (71% vs 42%) when financial incentives were provided for good performance. Strikingly, the extra processing resources invested in the prospective-memory task when incentives were provide­d did not affect performance on the ongoing task because incentives reduced ­participants’ mind-wandering. Support for the general approach of the dual-pathways model was reported by McDaniel et  al. (2013). They argued the monitoring required to perform a non-focal task would involve top-down attentional control. As a result, there would be sustained activity in the anterior prefrontal cortex, an area associated with attentional control. In contrast, the lesser demands of a focal task would produce only transient activity in that brain area. That is precisely what they found (see Figure 8.19). Cona et  al. (2016) conducted a meta-analysis of neuroimaging studies involving focal and non-focal tasks. As predicted by the dual-pathways model, patterns of brain activity differed between these two task types

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 384

28/02/20 4:21 PM



Everyday memory

385 Figure 8.19 (a) Sustained (PM Sus) and (b) transient (PM) activity in the left anterior prefrontal cortex (c) for non-focal (blue) and focal (red) prospective-memory (PM) tasks. The other conditions shown (i.e., CTL, Ong PM and Ong CTL) are not of theoretical relevance. From McDaniel (2013). Reprinted with permission of the Association for Psychological Science.

during maintenance and retrieval. Overall, non-focal tasks were associated with more activity in parts of the anterior prefrontal cortex (BA10). In contrast, focal tasks were associated with more activity than focal ones in the anterior cerebellum, ventral parietal regions (BA40) and BA9. Cona et al. (2016, p. 1) concluded as follows: “Prospective remembering is mediated mainly by top-down and stimulus-independent processes in non-focal, but by more automatic, bottom-up, processes in focal tasks.” According to the dual-pathways model, automatic cue detection sometimes occurs on prospective-memory tasks. Beck et  al. (2014b) provided relevant evidence. Participants initially performed a block of trials with an ongoing task and a prospective-memory task. This was followed by a block of trials where they were instructed not to perform the prospective-memory task even though prospective-memory targets appeared. These instructions presumably prevented deliberate target monitoring. Nevertheless, targets in this second block were associated with activation in brain regions (e.g., the ventral parietal cortex) associated with spontaneous retrieval. Scullin et  al. (2010) also obtained findings suggesting the existence of spontaneous retrieval of prospective-memory cues. They almost eliminated monitoring for prospective-memory cues by presenting only a single ­prospective-memory target after over 500 trials and by emphasising the importance of the ongoing task. This target was detected by 73% of participants when on a focal task but only 18% on a non-focal task. This is consistent with the model’s assumption that spontaneous retrieval occurs much more often with focal tasks. We turn now to the role of meta-cognition (emphasised within the dynamic multiprocess framework). Clear evidence of its importance was shown by Lourenço et al. (2015). Two tasks were performed at the same time: (1) an ongoing lexical decision task (see Glossary); (2) a prospective-memory

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 385

28/02/20 4:21 PM

386 Memory

task that involved responding to animal words. During practice, the target animal words were atypical (e.g., raccoon) or typical (e.g., dog). On the following experimental trials, only atypical animal words were presented as prospective-memory targets. What did Lourenço et  al. (2015) discover? We will focus on participants for whom the target words on the prospective-memory task were typical during practice but atypical on experimental trials. These participants showed little or no monitoring during the initial experimental trials because they expected the prospective-memory task to be easy. However, they used monitoring much more when they realised that task was actually harder than they had expected. The take-home message is that strategy use is flexible: our use of monitoring increases (or decreases) as a result of experience and expectation. Suppose you perform an ongoing task of counting the number of living objects presented on a screen containing approximately 20 objects. At the same time, you must perform a prospective-memory task of detecting a given target (e.g., apple) presented in the upper right corner of the screen. On some trials, an object semantically related to the target (e.g., banana) is presented on the ongoing task. According to the dynamic multiprocess framework, fixating the semantically related object should often cause spontaneous retrieval of the intention on the prospective-memory task. This in turn should lead to monitoring (revealed by rapid fixation on the upper right corner of the screen). Shelton and Christopher (2016) carried out the experiment described in the previous paragraph. Their findings were precisely as predicted by the dynamic multiprocess framework (see Figure 8.20). Thus, top-down monitoring is often triggered by bottom-up spontaneous retrieval. In sum, performance on most prospective-memory tasks is determined by interactive bottom-up (e.g., spontaneous retrieval) and top-down (e.g., monitoring) processes. The various theories discussed are mostly consistent with each other. However, the dynamic multiprocess framework has advanced our understanding with its assumption that the strategies used on prospective-memory tasks are flexibly influenced by meta-cognitive processes.

5

Figure 8.20 Frequency of cue-driven monitoring following the presentation of semantically related or unrelated cues; there was no prospectivememory task in the control condition.

Monitoring frequency

4.5

From Shelton and Christopher (2016).

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 386

4 3.5 3 2.5

Related cue

2

Unrelated cue

1.5 1 0.5 0 Prospective memory

Control

28/02/20 4:21 PM



387

Everyday memory

Improving prospective memory Failures of prospective memory caused by task interruptions can be reduced by forming an explicit intention to resume the interrupted task (Dodhia & Dismukes, 2009). Alternatively, we can place distinctive reminder cues where they will be seen at the appropriate time (Dismukes, 2012). For example, if you need to take a book into college tomorrow, you could leave it close to your keys. Motivation is also important. Cook et  al. (2015) found prospective memory was better using monetary rewards for good performance or monetary punishments for poor performance (Cook et al., 2015). These benefits were achieved without impairing performance of the ongoing task. These findings may have occurred because of reduced mind-wandering (Rummel et al., 2017, discussed earlier, p. 384) in the high-motivation conditions. A relatively simple (but effective) technique for enhancing prospective memory is based on Gollwitzer’s notion of implementation intentions: “‘If situation Y is encountered, then I will perform the goal-directed response Z!’ Thus, implementation intentions define exactly when, where, and how one wants to act toward realizing one’s goals” (Gollwitzer, 2014, p. 306). Chen et al. (2015) found in a meta-analysis that implementation intentions enhanced prospective memory. Why are implementation intentions effective? Scullin et  al. (2017) asked participants what they were thinking shortly after receiving implementation-intention instructions. These instructions increased the tendency for participants to focus on specific aspects of the prospective-memory task and reduced mind-wandering. Gollwitzer argued that forming an implementation intention is like forming an “instant habit” that reduces the processing costs when intentions are retrieved on a prospective-memory task. Support was reported by Rummel et  al. (2012). Participants receiving implementation intentions performed better on a prospective-memory task within an ongoing task. They also included trials where participants were told not to respond to target words from the prospective-memory task. These target words caused more disruption to the ongoing task (see Glossary) for participants previously given implementation intentions. This happened because participants were more likely to retrieve their intentions relatively “automatically”.

KEY TERM Implementation intentions Action plans designed consciously to achieve some goal (e.g., healthier eating) based on specific information concerning where, when and how the goal will be achieved.

Overall evaluation There are several ways progress has been made in understanding prospective memory: (1) The number and nature of the processes involved have been identified with increasing clarity. (2) Reasons for prospective-memory failures in various groups (e.g., pilots; obsessional individuals) have been identified. (3) There have been several theoretical advances. The dynamic multiprocess framework (Shelton & Scullin, 2017) provides a coherent account of most findings with its emphasis on complex interactions between top-down and bottom-up processes.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 387

28/02/20 4:21 PM

388 Memory

(4) The cognitive neuroscience approach has identified brain areas associated with different prospective-memory processes. (5) Researchers are developing a new field of “prospection” or future thinking including prospective memory. What are the limitations of theory and research on prospective memory? First, it is assumed within several theories (e.g., the dual-pathways model; see Figure 8.17) that monitoring will typically be used with non-­ focal ongoing tasks. However, this is not entirely correct. Anderson et  al. (2018) instructed participants to engage in monitoring on every trial on the ­prospective-memory task or simply instructed them to perform that task. Both groups engaged in monitoring but the former group did so to a greater extent: they detected 73% of prospective-memory targets compared to only 59% for the latter group. Second, most theories de-emphasise individual differences in processing on prospective-memory tasks. For example, Scullin et al. (2018) gave participants the task of pressing the Q key whenever they saw a word belonging to the category of “fruits”. Thus, they should have encoded fruit as an abstract category. However, participants often encoded fruit as a specific example (e.g., apple), or they hardly thought at all about the instruction to focus on “fruits” (see Figure 8.21). Third, participants in most laboratory experiments lack strong incentives to exhibit good prospective-memory performance. In contrast, the incentives in real life can include saving lives (e.g., air traffic controllers). Fourth, moment-by-moment decisions to use top-down or bottom-up processes often involve meta-cognition (Shelton & Scullin, 2017). However, much remains to be discovered about meta-cognitive processes. Fifth, the processes involved in prospective memory are more complex than typically assumed. For example, the joint demands of performing a prospective-memory task and an ongoing task produce a reduction in mind-wandering (Rummel et  al., 2017). Our limited understanding of the factors determining mind-wandering often makes it hard to predict ­prospective-memory performance.

Hardly thought about it 22.5%

Figure 8.21 Different ways the instruction to press Q for fruit words was encoded.

Specific exemplar bias 26.4%

Category bias 51.1%

From Scullin et al. (2018).

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 388

28/02/20 4:21 PM



Everyday memory

389

Sixth, prospective memory in everyday life differs from the laboratory because it is more common in everyday life to maintain our intentions for long periods of time, which reduces the involvement of attentional and monitoring processes.

CHAPTER SUMMARY •

Introduction. What people remember in traditional memory studies is largely determined by the experimenter’s demands for accuracy. In contrast, remembering in everyday life is determined by our personal goals. Tailoring our message to create an impression causes subsequent memory distortions. Memory research should strive for generalisability and representativeness. The distinction between traditional and everyday memory research is imprecise.



Autobiographical memory: introduction. Autobiographical memories generally have greater personal significance and complexity than episodic memories and can involve semantic memory. Autobiographical memory helps to maintain social bonds, a sense of self-continuity and self-enhancement. Individuals with highly superior autobiographical memory often have obsessional symptoms and devote much time to thinking about past events. Flashbulb memories are perceived as more vivid than other memories even though they are often inaccurate and have only moderate consistency. Flashbulb memories generally resemble other memories in their susceptibility to interference and forgetting.



Memories across the lifetime. There is infantile amnesia for memories of the first two years of life. It occurs because the cognitive self only emerges towards the end of the second year of life and because of hippocampal neurogenesis (generation of new neurons within the hippocampus). Relative amnesia for the preschool years ends when children have a good command of language. The reminiscence bump for important personal memories is much stronger for positive memories than negative ones because the retrieval of autobiographical memories is often guided by the life script.



Theoretical approaches to autobiographical memory. According to the self-memory system model, autobiographical information is stored hierarchically. An individual’s goals and personality influence the retrieval of autobiographical memories. Autobiographical memories can be accessed via direct or generative retrieval. The prefrontal cortex (associated with controlled processing) and the amygdala (involved in emotional processing) are activated during autobiographical retrieval. Several interconnected brain networks are involved in autobiographical retrieval, with the brain areas activated shifting between initial searching for memories and their subsequent elaboration. Depressed individuals exhibit over-general

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 389

28/02/20 4:21 PM

390 Memory

autobiographical memory. Therapy to increase the specificity of depressed patients’ autobiographical memories has proved successful in reducing depressive symptoms. •

Eyewitness testimony. Eyewitnesses’ initial confidence in their initial identification provides valid evidence concerning its accuracy. Eyewitness memory is influenced by several factors including confirmation bias, stress and ageing. Misinformation typically produces distorted eyewitness memory but often does not cause permanent alteration of memory traces. Misinformation can enhance eyewitness memory if it acts as a cue facilitating retrieval of an event. Eyewitness memory for faces is affected by the crossrace effect, and also by difficulties in recognising a given unfamiliar face from different photographs of that person.



Enhancing eyewitness memory. Culprits are more likely to be identified from simultaneous than from sequential line-ups, but more innocent individuals are identified with simultaneous line-ups. Which type of line-up is preferable depends on the magnitude of these two effects. The cognitive interview leads eyewitnesses to produce many more detailed memories with a small increase in inaccurate memories. Inaccurate memories can be detected because eyewitnesses often have low confidence in the accuracy of such memories. Mental reinstatement and the requirement to report all details are both crucial to the success of the cognitive interview.



Prospective memory. Prospective memory involves successive stages of intention formation, monitoring, cue detection, intention retrieval and intention execution. Event-based prospective memory is often better than time-based prospective memory because the intended actions are more likely to be triggered by external cues. Many failures of prospective memory (e.g., by pilots) occur when individuals are interrupted while carrying out an action plan and lack time to form a new plan. Individuals with obsessive-compulsive disorder engage in excessive checking behaviour which may reduce their confidence in their prospective-memory ability.



Theoretical perspectives on prospective memory. According to the dynamic multiprocess framework, prospective memory involves interactions between top-down processes (e.g., monitoring) and bottom-up ones (e.g., spontaneous retrieval). The extent to which effortful monitoring is used depends on meta-cognitive processes assessing how well the prospective-memory task would be performed in its absence. Neuroimaging evidence supports the distinction between top-down and bottom-up processes. Implementation intentions enhance prospective-memory performance by facilitating the relatively “automatic” retrieval of intentions.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 390

28/02/20 4:21 PM



Everyday memory

391

FURTHER READING Baddeley, A., Eysenck, M.W. & Anderson, M.C. (2020). Memory (3rd edn). Abingdon, Oxon.: Psychology Press. This textbook provides detailed coverage of research and theory on all the main topics discussed in this chapter. Conway, M.A., Justice, L.V., D’Argembeau, A. (2019). The self-memory system revisited: Past, present, and future. In J.H.Mace (ed). The Organisation and Structure of Autobiographical Memory (pp. 28–51). New York: Oxford University Press. Martin Conway provides an update of his influential theoretical approach to autobiographical memory. Davis, D. & Loftus, E.F. (2018). Eyewitness science in the 21st century: What do we know and where do we go from here? In E.A. Phelps, L. Davachi & J.T. Wixted (eds), Stevens’ Handbook of Experimental Psychology and Cognitive Neuroscience, Vol. 1: Learning and Memory (4th edn; pp. 529–566). New York: Wiley. Deborah Davis and Beth Loftus discuss theory and research in the field of eyewitness testimony. Putnam, A.L., Sungkhasettee, V.W. & Roediger, H.L. (2017). When misinformation improves memory: The effects of recollecting change. Psychological Science, 28, 36–46. Adam Putnam and colleagues shed new light on the circumstances in which eyewitness memory is (or is not) adversely affected by misinformation. Sheldon, S., Nicholas B., Diamond, N.B., Armson, M.J., Daniela J. Palombo, D.J., (2018). Assessing autobiographical memory: Implications for understanding the underlying neurocognitive mechanisms. In E.A. Phelps, L. Davachi & J.T. Wixted (eds), Stevens’ Handbook of Experimental Psychology and Cognitive Neuroscience, Vol. 1: Learning and Memory (4th edn; pp. 363–396). New York: Wiley. This chapter emphasises the importance of cognitive neuroscience to an understanding of autobiographical memory. Shelton, J.T. & Scullin, M.K. (2017). The dynamic interplay between bottom-up and top-down processes supporting prospective remembering. Current Directions in Psychological Science, 26, 352–358. This article updates the influential dynamic multiprocess framework including relevant research. Smith, R.E. (2017). Prospective memory in context. Psychology of Learning and Motivation, 66, 211–249. Contemporary views on prospective memory are discussed by Rebekah Smith with an emphasis on the role played by contextual information.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_2.indd 391

28/02/20 4:21 PM

Taylor& Francis Taylor & Francis Group http://taylorandfrancis.com

LANGUAGE

What is language? According to Harley (2013, p. 5), language: “is a system of symbols and rules that enable us to communicate. Symbols stand for other things: Words (written or spoken) are symbols. The rules specify how words are ordered to form sentences.” Communication is the primary function of language. However, Crystal (1997) identified eight different functions. In addition to communication, we use language for thinking, to record information, to express emotion (e.g., “I love you”), to pretend to be animals (e.g., “Woof! Woof!”), to express identity with a group (e.g., singing in church), and so on. It is somewhat surprising there was little research on language prior to the late 1950s. The behaviourists (e.g., Skinner, 1957) argued that the language we produce consists of rewarded conditioned responses. According to this analysis, there is nothing special about language and no reason other species should not be able to develop language. The situation was transformed by Noam Chomsky (1957, 1959). He claimed (correctly!) that the behaviourist approach to language was woefully inadequate. According to him, language possesses several unique features (e.g., grammar or syntax) and can only be acquired by humans. Chomsky’s ideas led to a dramatic increase in language research (Harley & McAndrew, 2015). As a result, language has been of central importance within cognitive psychology ever since.

VISUAL PERCEPTION AND ATTENTION

Our lives would be remarkably limited without language. Our social interactions depend very heavily on language and all students need a good command of language. The main reason we know much more than previous generations is because knowledge is passed on from one generation to the next via language.

PART III

Language

Is language unique to humans? Bonobos (a species of great ape) have developed better language skills than any other non-human species. Panbanisha (1985–2012) was trained on a special keypad with about 400 geometric patterns or lexigrams on it.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 393

28/02/20 4:11 PM

394 Language

KEY TERM Lexigrams Symbols used to represent words in studies on communication.

He acquired a vocabulary of 3,000 words by the age of 14 years and often combined symbols in their correct order (e.g., “Please can I have an iced coffee?”). It has often been argued that apes’ use of language lacks spontaneity and refers almost exclusively to the present. This matters because these are two criteria for language. However, 74% of the utterances of Panbanisha and two other great apes were spontaneous (Lyn et al., 2011). In addition, the apes referred to the past as often as young children and produced more responses referring to future intentions. Lyn et  al. (2014) found bonobos (including Panbanisha and her half-brother Kanzi) could communicate about displaced objects (i.e., those no longer present). Genty et  al. (2015) found bonobos were more likely to repeat a message with a familiar recipient but to elaborate the original message with an un­ familiar recipient. This ability to vary communications to accommodate the recipient’s needs is characteristic of children’s use of language. Clay and Genty (2017) reviewed the research on bonobos, concluding that their behaviour exhibits “considerable communicative complexity, flexibility, and intentionality”. What are the main limitations of bonobos’ language acquisition? First, bonobos’ utterances are much less likely than those of young children (aged 12–24 months) to reflect motivation to engage in social inter­ action. Children’s statements often refer to intentions, attention seeking or offering something to someone else, whereas 80% of bonobos’ utterances are requests (e.g., for food) (Lyn et  al., 2014). Such evidence led Scott-Phillips (2015) to argue that apes lack our ability to engage in “mind reading”, which allows us to infer how our utterances are likely to be interpreted. This is at least partly correct but probably overstated (Moore, 2015). Second, children’s language skills develop dramatically after the age of 2 years and so become markedly superior to those of bonobos. For example, children’s language exhibits much more productivity (expressing numerous ideas) and is much more complex (e.g., sentence length; use of grammatical structures). Third, as Chomsky (quoted in Atkinson et  al., 1993) pointed out, “If an animal had a capacity as biologically advantageous as language but somehow hadn’t used it until now, it would be an evolutionary miracle, like finding an island of humans who could be taught to fly.” Fourth, an increasingly common view (e.g., Christiansen & Chater, 2008, 2016; discussed below) is that several non-language cognitive processes (e.g., short-term memory; thinking; learning) play a vital role in the development of language. Bonobos can acquire only some aspects of language, in part because their non-language cognitive processes are considerably inferior to ours.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 394

28/02/20 4:11 PM

Language 395

Is language innate?

KEY TERMS

There has been fierce controversy on the issue of whether language is innate. Chomsky claimed humans possess an innate universal grammar (a set of grammatical principles found in all human languages). In Chomsky’s own words, “Whatever universal grammar is, it’s just the name for [our] genetic structure” (Baptista, 2012, pp. 362–363).

Linguistic universals Features (e.g., preferred word order; the distinction between nouns and verbs) found in the great majority of the world’s languages.

According to Chomsky, there are several linguistic universals (features common to nearly every language) that jointly form a universal grammar. One possible linguistic universal (feature common to nearly every language) is recursion (embedding clauses within sentences to generate increasingly long sentences). For example, we can use recursion to expand the sentence “John met Mary in Brighton” to “John, who was a handsome man, met Mary in Brighton”. Other possible linguistic universals are lexical categories (e.g., nouns; verbs; adjectives) and word order (subject-verb-object or subject-object-verb).

Recursion Turning simple sentences into longer and more complex ones by placing one or more additional clauses within them.

Chomsky proposed an innate universal grammar for various reasons. First, he argued, it explains why only humans fully develop language. Second, it explains the alleged broad similarities among the world’s languages. Third, he claimed that young children develop language much faster than would be predicted on the basis of their exposure to spoken language. However, experience obviously determines which language any given child learns. Christiansen and Chater (2008) totally disagreed with Chomsky. Their main points were as follows: (1) Languages differ enormously, which is inconsistent with the notions of universal grammar and linguistic universals. (2) The notion that natural selection has provided us with genes responsive to abstract features of languages we have never encountered is mystifying. (3) Languages change amazingly rapidly. For example, all Indo-European languages emerged from a common source in under 10,000 years (Baronchelli et  al., 2012). Natural selection could not have kept pace. (4) “Language has been shaped by the brain: language reflects pre-­ existing, and hence non-language-specific, human learning and processing mechanisms” (Christiansen & Chater, 2008, p. 491). In other words, our language ability is less special and less different from our other cognitive abilities than implied by Chomsky. (5) Children find it easy to acquire language because it was invented by humans to take account of human abilities: “language has adapted to our brains” (Christiansen & Chater, 2008, p. 490). In what follows, we discuss relevant research. As we will see, this research has revealed many limitations in Chomsky’s theoretical approach.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 395

28/02/20 4:11 PM

396 Language

Findings: linguistic universals and genes How different are the world’s languages? The main European languages are very similar, but large differences appear when all the world’s 6,000 to 8,000 languages are considered. Evans and Levinson (2009, p. 429) did precisely that and concluded, “There are vanishingly few universals of language in the direct sense that all languages exhibit them”. For example, there is limited evidence that recursion (discussed above, p. 395) is lacking in the Amazonian language Pirahã. However, Futrell et al.’s (2016) thorough attempt failed to find any strong evidence for it. Evidence concerning other suggested language universals is hotly contested. Evans and Levinson (2009) concluded some languages (e.g., the Austronesian language Charrosso) lack one or more of the lexical categories of noun, verbs and adjectives. However Chung (2012) analysed Charrosso in considerable detail and concluded that in fact it has nouns, verbs and adjectives! She concluded the failure to identify the three main lexical categories in some languages occurs because they are insufficiently studied. Word order has claims to be a linguistic universal. Greenberg (1963) found the subject preceded the object in 98% of numerous languages. The word order subject-verb-object (S-V-O) was most common followed by ­subject-object-verb (S-O-V). Sandler et  al. (2005) studied the Al-Sayyid group living in an isolated Israeli community. High levels of congenital deafness in this community led them to develop Bedouin Sign Language, which uses the S-O-V word order even though it differs from other languages to which they are exposed. The above findings can be interpreted in more than one way. It can be argued the central importance of the subject in a sentence means it makes sense for the subject to precede the object regardless of any genetic ­considerations; and that ordering facilitates communication. Fedzechkina et  al. (2018) argued that word-order preferences in any language reflect the limitations of human information processing. More specifically, they predicted that words strongly associated grammatically (and in meaning) should appear close together within sentences to minimise processing costs. This is, indeed, the case across languages that otherwise appear superficially different. Bickerton (1984) proposed the language bioprogram hypothesis, which is closely related to Chomsky’s views. According to this hypothesis, children will create a grammar even if hardly exposed to a proper language. Senghas et  al. (2004) studied deaf Nicaraguan children at special schools. These children developed a new system of gestures that expanded into a basic sign language (Nicaraguan Sign Language) passed on to successive groups of children. Since this sign language does not resemble Spanish or the gestures of hearing children, it is a genuinely new language. Nicaraguan Sign Language is still developing – Kocab et al. (2016) found that only later

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 396

28/02/20 4:11 PM

Language 397

generations of signers could successfully communicate complex temporal information. The above findings suggest humans have a strong innate motivation to acquire language (including grammatical rules) and to communicate with others. However, they provide only modest support for a universal grammar.

KEY TERM Child-directed speech The short, simple, slowly spoken sentences used by parents and others when talking to young children.

According to Chomsky, only humans have the genetic make-up permitting language acquisition. Relevant evidence comes from research on the KE family in London. Across three generations, about 50% of family members have suffered from severe language problems (e.g., difficulties in understanding speech; slow and ungrammatical speech). Their complex language disorder was controlled by a specific gene FOCP2 (Lai et al., 2001). More specifically, mutations of this gene were found only in affected family members. Why does FOXP2 cause these language impairments? It is probably a hub in various gene networks leading to impaired functioning of brain areas directly involved in language. However, we must not exaggerate the importance of FOXP2 for various reasons: (1) The FOXP2 sequence is found in numerous vertebrate species not possessing language. (2) Other genes such as ATP2C2 and CMIP are also associated with specific language impairment (Graham & Fisher, 2013). (3) Mueller et  al. (2016) found common genetic variants in FOXP2 had negligible effects on language ability within a normal sample.

Findings: child-directed speech Chomsky claimed children’s rapid acquisition of language cannot be explained solely on the basis of their exposure to language. However, he minimised the richness of the linguistic input to which children are exposed. Parents and other adults use child-directed speech involving very short, simple sentences, a slow rate of speaking and use of a restricted vocabulary. Unsurprisingly, children whose parents use a lot of child-directed speech show faster language development than other children (Rowe, 2008). Thus, most parents are “in tune” with children’s current language abilities and so provide strong environmental support. Chomsky exaggerated the speed with which young children master language. Children’s speech during their first two years of speaking is remarkably limited (Bannard et  al., 2009) – they use a small set of familiar verbs and often repeat back what they have just heard. Bannard et al. (2013) found 3-year-olds often engage in “blind copying” – they imitate everything an adult has just said even when part of it does not add any useful information.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 397

28/02/20 4:11 PM

398 Language

Findings: is language special? Much evidence indicates that language is less special (in the sense of being different from other cognitive functions) than assumed by Chomsky. Campbell and Tyler (2018) reviewed neuroimaging research indicating that many brain regions are included within the “language network”. Most of these areas are associated with general cognitive functions (e.g., attention; memory). However, some brain areas (e.g., BA45; posterior middle temporal gyrus) form a “syntax system” involved in syntactic processing, which is damaged in patients with impaired syntactic processing. This syntax system could be regarded as “special”. Other research has indicated that language comprehension and production both depend on general cognitive processes such as attention and cognitive control (see McClain & Goldrick, 2018, for a review). There is also much evidence for a “language-as-skill” framework, according to which language acquisition is a type of skill acquisition resembling learning to play a musical instrument (Chater & Christiansen, 2018). Within this framework “Language is connected to basic psychological mechanisms of learning and processing” (p. 207). In other words, language skills are not “special”.

Evaluation Chomsky’s theoretical approach receives some support from evidence suggesting only humans possess fully developed language. His general approach also receives limited support from the identification of specific genes that sometimes influence language acquisition. What are the limitations of Chomsky’s approach? First, the world’s languages differ far more than he predicted. Second, Chomsky now admits the universal grammar is very restricted in scope and so there are very few linguistic universals. Third, the notion that children’s linguistic input is too impoverished to produce language acquisition is highly debatable. Fourth, Chomsky de-emphasised the importance of our high-level cognitive abilities in explaining why only humans have fully developed language skills.

Whorfian hypothesis The best-known theory about the relationship between language and thought was proposed by Benjamin Lee Whorf (1956). He was a fire prevention officer for an insurance company, and his hobby was linguistics. Whorf’s views have often been distorted to imply that he believed that language necessarily determines thought (and behaviour). Whorf’s actual views were far more reasonable. For example, he discussed a hypothetical case in which an explosion occurred when workers were careless with cigarettes near empty gasoline drums. The workers’ carelessness may have been due in part to the word empty, which suggests there is nothing in the drum (not even vaporous fumes). This example could be interpreted as meaning that Whorf believed that language determines

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 398

28/02/20 4:11 PM

Language 399

thought and behaviour. However, he clarified his views (quoted in Lee, 1996, p. 153): “I don’t wish to imply that language is the sole or even the leading factor in . . . the fire-causing carelessness through misunderstandings induced by language, but that this is simply a coordinate factor along with others.”

KEY TERMS

According to the Whorfian hypothesis, language influences thinking and behaviour in various ways. Of central importance is the notion of linguistic relativity – how speakers of any given language think are influenced by the language they speak.

Linguistic relativity The notion that speakers of different languages think differently.

Whorfian hypothesis The theoretical assumption that language influences perception, thinking and behaviour.

Findings Categorical perception means observers find it easier to discriminate between stimuli belonging to different categories than those in the same category (see Chapter 9). Categorical perception is assumed to depend in part on language. Suppose we compared the categorical perception of colour in people speaking different languages that varied in the number of basic colour terms. According to the Whorfian hypothesis, we might predict these linguistic differences would influence the perception (and memory for) colour. Support for this prediction was reported by Winawer et  al. (2007). Russian differs from English in having separate words for dark blue (siniy) and light blue (goluboy). Russian participants found it easier than English ones to discriminate between dark and light blue stimuli. Other studies have produced different findings. For example, Wright et al. (2015) compared colour memory in English speakers (11 basic colour terms) and Himba speakers (5 basic colour terms) but found no differences. Wright et  al. also reviewed other research in which the findings were a mixture of significant and non-significant with no obvious explanation for the differences. Manner of motion (e.g., hopping; running) is expressed more prominently in English than Spanish. As a result, Kersten et  al. (2010) argued English speakers should outperform Spanish ones on a task where novel animated objects were categorised on the basis of manner of motion. The findings were as predicted, suggesting language can influence thinking and performance. We must not exaggerate language’s impact on thinking. Consider a study by Li et al. (2009). Observers saw objects made of a given substance (e.g., a plastic whisk). English speakers focused on the object itself (whisk) rather than the substance (plastic), whereas Mandarin and Japanese speakers focused on the substance. The above differences may reflect differences in the three languages and so support the Whorfian hypothesis. However, when participants simply indicated how likely they would be to think of various objects as objects or as substances, there were no differences across the three languages. Thus, the effects of language were very task specific.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 399

28/02/20 4:11 PM

400 Language

Frank et  al. (2008) studied the Pirahã, an Amazonian tribe. They have no words to express precise quantities or numbers, not even “one”. Nevertheless, the Pirahã could perform exact quantitative matches even with large numbers of objects. However, their performance was inaccurate when information needed to be remembered. Thus, language is not essential for certain numerical tasks. However, it provides an efficient way of encoding information and so boosts performance when memory is required.

Evaluation Language influences our thinking and performance on many tasks (Wolff & Holmes, 2011). For example, it can enhance memory (Frank et al., 2008) and increase categorical perception (Winawer et  al., 2007). This is unsurprising: “Language mobilises ordinary cognitive mechanisms whose effects on people’s thoughts, feelings, and judgments should be uncontroversial” (Casasanto, 2016, p. 715). The crucial issue is not whether the Whorfian hypothesis is correct but rather it is to identify the conditions in which language does (and does not) influence cognition. Regier and Xu (2017) addressed the latter issue by focusing on mental uncertainty. High mental uncertainty “opens the door to language to fill in some of the missing elements, and there should be a relatively strong effect of language” (p. 1). For example, Bae et  al. (2015) asked American participants to identify the colour they had seen immediately or after a delay. There was greater bias reflecting colour categories in the English language in the latter condition where there was greater uncertainty concerning the colour presented.

Language chapters We possess four main language skills (listening to speech; reading; speaking; and writing). It is perhaps natural to assume any given person will have generally strong or weak language skills. That assumption is often incorrect with respect to people’s first language – for example, many people speak fluently and coherently but find writing difficult. The assumption is even less exact with respect to people’s second language. The first author has spent numerous summer holidays in France and can just about read newspapers and easy novels in French. However, he finds it agonisingly hard to understand rapidly spoken French and his ability to speak French is poor. The three chapters in this section (Chapters 9–11) in this section focus on the four main language skills. Chapter 9 deals with the basic processes involved in reading and listening to speech. The emphasis is on how readers and listeners identify and make sense of individual words. As we will see, the study of brain-damaged patients has clarified the complex processes underlying reading and speech perception. Chapter 10 deals mostly with the processes involved in the comprehension of sentences and discourse (connected text or speech). Most of these processes are common to text and speech. An important part of sentence

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 400

28/02/20 4:11 PM

Language 401

understanding involves parsing (working out the sentence’s grammatical structure). Understanding discourse involves drawing numerous inferences and often forming a mental model of the situation described. Chapter 11 deals with the remaining two main language abilities: speaking and writing. We spend much more of our time speaking than writing. This helps to explain why we know much more about speech production than writing. Research on writing has been somewhat neglected until recently. This is regrettable given the importance of writing skills in many cultures. The processes discussed in these three chapters are interdependent. For example, listeners use language production processes to predict what speakers will say next (Pickering & Garrod, 2013). More generally, Chater et al. (2016, p. 244) argued that “Language comprehension and production are facets of a unitary skill”.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 401

28/02/20 4:11 PM

Taylor& Francis Taylor & Francis Group http://taylorandfrancis.com

Chapter

Speech perception and reading

9

INTRODUCTION Humans excel in their command of language. Language is so important that this chapter and the following two are devoted to it. In this chapter, we consider basic processes involved in recognising spoken words and reading words. As discussed in Chapter 10, many comprehension processes are very similar whether we listen to someone talking or read a text. For example, you would probably understand the sentence, “You have done exceptionally well in your cognitive psychology examination”, equally well whether you read or heard it. Rayner and Clifton (2009) identified two important similarities between speech perception and reading. First, both are typically fast. Adult readers can read between 250 and 350 words per minute. Speech perception is slower but can approach typical reading rates. Second, reading and speech perception are incremental – much processing (e.g., semantic; syntactic) occurs while a word is attended to. Another similarity concerns anticipatory language processing. Readers and listeners devote resources during sentence processing to predicting upcoming words or phrases (Huettig, 2015). The complexities involved are discussed fully later (see pp. 415–416). There is a final important similarity. Children learn to understand speech before they learn to read. Unsurprisingly, some processes and abilities involved in understanding speech are also relevant in reading. For example, individuals with severe reading problems frequently have problems with auditory processing (Farmer & Klein, 1995). More specifically, such individuals are often impaired at categorising phonemes (speech sounds) (O’Brien et al., 2018). There are also several differences between reading and speech perception. In reading, most words can be seen as a whole and remain in vision. In contrast, spoken words are spread out in time and are transitory. In addition, it is harder to decide where one word ends and the next starts. Speech generally provides a more ambiguous signal than printed text. For example, when words were spliced out of spoken sentences

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 403

28/02/20 4:11 PM

404 Language

and presented on their own, they were recognised only 50% of the time (Lieberman, 1963). Our ability to hear what a speaker is saying is often impaired by other speakers close by and/or irrelevant noises. In contrast, readers are rarely distracted by other visual stimuli. Finally, demands are greater when listening to speech than reading a text because previous words are inaccessible. So far we have indicated why speech perception can be harder than reading. However, speech perception can be easier in some ways. Speech often contains prosodic cues (see Glossary and Chapter 10), which are hints to sentence structure and intended meaning provided by the speaker’s pitch, intonation, stress and timing. Speakers also often accompany their speech with meaningful gestures. In contrast, the main cues to sentence structure in text are punctuation marks (e.g., commas; semi-colons). These cues are often less informative than speakers’ prosodic cues. Some adult brain-damaged patients can understand spoken language but cannot read. Other patients have good reading skills but cannot understand the spoken word. Thus, reading and speech perception involve somewhat different brain areas and cognitive processes. This chapter starts with the basic processes specific to speech perception (e.g., those required to divide the speech signal into separate words and to recognise those words). After that, we consider the basic processes specific to reading (e.g., those involved in recognising, reading individual words and guiding our eye movements). Why have we adopted this ordering (speech perception followed by reading)? As mentioned earlier, most children develop competence in speech perception several years before they can read. In addition, some processes that children use while learning to read closely resemble those acquired earlier when learning to understand spoken language. In Chapter 10, we discuss comprehension processes common to reading and listening. The emphasis there is on larger units of language consisting of several sentences.

SPEECH (AND MUSIC) PERCEPTION Speech perception is easily the most important form of auditory perception. However, “The human relationship with sound is much deeper and more ancient than our relationship with words” (Kraus & Slater, 2016, p. 84). Important forms of auditory perception not involving words include music perception and identifying the nature and sources of environmental sounds. The relationship between speech perception and auditory perception is controversial. Perhaps humans have special speech-perception mechanisms: the “speech is special” approach (e.g. Trout, 2001). Alternatively, the same general mechanisms may process speech and non-speech sounds (Carbonell & Lotto, 2014). Brandt et al. (2012, p. 1) claimed controversially that we can “describe language as a special type of music”. There is some support for this claim. First, music and language perception both involve the goal of “grouping acoustic features together to form meaningful objects and streams” (Kraus & Slater, 2016, p. 86).

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 404

28/02/20 4:11 PM



405

Speech perception and reading

Second, if you listen repeatedly to the same looped recording of speech, it often starts to sound like singing when you stop attending to its meaning (Tierney et  al., 2013). Brain areas associated with music perception were more activated by repeated speech perceived as song than repeated speech not perceived as song. Tierney et al. (2018) also studied the speech-to-song illusion. Ratings of the musicality of spoken phrases increased when these phrases were repeated and listeners became more responsive to the musical structure (e.g., melodic structure) of the phrases. Further evidence on the relationship between speech and music perception is discussed briefly below.

KEY TERM Categorical perception A sound intermediate between two phonemes is perceived as being one or other of the phonemes; a similar phenomenon is found in vision with colour perception.

Categorical perception Suppose listeners are presented with a series of sounds, starting with /ba/ and gradually moving towards /da/, and report what sound they hear. What typically happens is categorical perception – speech stimuli intermediate between two phonemes are categorised as one of those phonemes (discussed later in a section entitled “Ganong effect”, p. 415). Below we consider whether categorical perception is unique to speech perception. Raizada and Poldrack (2007) presented listeners with two auditory stimuli and asked them to decide whether they represented the same phoneme. There was evidence of categorical perception. The differences in brain activation associated with the two stimuli were amplified when they were on opposite sides of the boundary between the two phonemes. There is often only limited evidence for categorical perception with speech sounds. It is less evident with vowels than consonants, and listeners are often sensitive to variations within a given perceptual category (Monahan, 2018). Bidelman and Walker (2017) reviewed findings indicating categorical perception is also present in music. However, it was stronger for speech than music (especially among non-musician listeners). These findings suggest categorical perception occurs mostly with familiar stimuli. Finally, Weidema et  al. (2016) presented various pitch contours embedded in linguistic or melodic phrases. There was evidence of categorical perception in both the language and music contexts. However, identical pitch contours were categorised differently depending on whether they were perceived as language or music. Thus, there are both similarities and differences in categorical perception in speech and music.

Case study: American Sign Language

Do music and speech perception involve the same brain areas? The relationship between music and speech perception can be studied by comparing the brain areas activated with each form of perception. However, “The extent of neural overlap between music and speech remains hotly debated” (Jantzen et al., 2016, p. 1). Some neuroimaging research has reported mostly non-overlapping brain regions are involved in music and speech perception. However, Slevc and Okada (2015) argued this is not the case when relatively complex tasks are used. They found complex music and speech perception both involved cognitive control (using the prefrontal cortex areas), which is used “to

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 405

28/02/20 4:11 PM

406 Language Figure 9.1 (a) Areas activated during passive music listening (blue) and passive speech listening (orange); (b) Areas activated more by listening to music than speech (blue) or the opposite (orange).

(a) Single condition activation likelihood estimates: Passive listening to speech & music Left hemisphere

a –55

Right hemisphere

–46

–50

–42 b c d

Lacroix et al. (2015).

c

55

50

46

42 d

Music passive listening Speech passive listening

b a

(b) Passive listening contrasts: Music passive listening > Speech passive listening L

–50

L

–8

R

R

50

L

6

Speech passive listening > Music passive listening

R

detect and resolve conflict that occurs when expectations are violated and interpretations must be revised” (p. 637). Lacroix et  al. (2015) conducted a meta-analytic review. Passive music and speech listening were both associated with activation in large areas of the superior temporal gyrus. However, the precise areas differed between music and speech perception (see Figure 9.1). In addition, Broca’s area (in the inferior frontal gyrus) was more activated during speech perception than music perception. Lacroix et al. concluded: “Our findings of spatially distinct regions for music and speech clearly suggest the recruitment of ­distinct brain networks for speech and music” (p. 15). Research on brain-damaged patients has also revealed important differences between speech and music perception. Some patients have intact speech perception but impaired music perception whereas others have intact music perception but impaired speech perception (Peretz & Coltheart, 2003). In sum, there are important similarities between music and speech perception (e.g., the involvement of cognitive control). However, they differ with respect to underlying brain areas and cognitive processes. Note that the specific tasks used to assess music or speech perception greatly ­influence the precise brain areas activated (Lacroix et al., 2015).

Processing stages KEY TERM Syllable A unit of speech consisting of one vowel sound with or without one or more additional consonants (e.g., water has two syllables: wa and ter).

A sketch map of the main processes involved in speech perception is shown in Figure 9.2. Initially, listeners often have to select out the speech signal of interest from several other irrelevant auditory inputs (e.g., other voices; see Chapter 5). After that, decoding involves extracting discrete elements (e.g., phonemes or other basic speech sounds) from the speech signal. There is controversy as to whether decoding involves identifying phonemes (small units of sound; see Glossary) or syllables (speech units based on a vowel sound often plus one or more consonants). Goldinger

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 406

28/02/20 4:11 PM



407

Speech perception and reading Figure 9.2 The main processes involved in speech perception and comprehension.

From Cutler and Clifton (1999). By permission of Oxford University Press.

and Azuma (2003) argued the unit in speech perception varies flexibly. Listeners heard lists of non-words recorded by speakers who had been told phonemes or syllables were the basic units of perception. Listeners detected phoneme targets faster than syllable targets when the speaker believed phonemes were the basic units, but the opposite was the case when the speaker believed syllables were the basic units. These findings suggest either phonemes or syllables can form the perceptual units in speech perception. There is an important distinction between phonemes and allophones (variant forms of any given phoneme). Consider the words pit and spit. They both contain the same phoneme /p/. However, there are slight differences in the way /p/ is pronounced in the two words. Thus, there are two allophones relating to /p/ but only one phoneme and so allophones are context-dependent whereas phonemes are context-independent. There has been controversy as to whether phonemes or allophones are the basic units in spoken word recognition. However, Mitterer et  al. (2018) reviewed the literature and reported several experiments suggesting that early processing of spoken words is based on allophones rather than phonemes. The third stage (word identification) is of special importance. Various problems in word identification are discussed shortly. However, one

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 407

KEY TERM Allophones Variant forms of a given phoneme; for example, the phoneme /p/ is associated with various allophones (e.g., in pit and spit; Harley, 2013).

28/02/20 4:11 PM

408 Language

KEY TERM Segmentation Dividing the almost continuous sounds of speech into separate phonemes and words.

problem will be mentioned here. All English words are formed from only about 35 phonemes. As a result, most spoken words resemble many other words at the phonemic level, making them hard to distinguish. However, the task becomes easier if listeners make use of allophones rather than phonemes (discussed above). The fourth and fifth stages both emphasise speech comprehension. The fourth stage focuses on utterance interpretation. This involves constructing a coherent meaning for each sentence based on information about individual words and their order within the sentence. Finally, the fifth stage involves integrating the meaning of the current sentence with preceding speech to construct an overall model of the speaker’s message. In sum, speech perception and comprehension involve several processing stages. However, it is an oversimplification to assume speech perception typically involves serial processes occurring in the neat-and-tidy fashion shown in Figure 9.2.

LISTENING TO SPEECH Understanding speech is often difficult for two broad types of reasons. First, speech perception depends on several aspects of the speech signal (discussed shortly, pp. 409–412). Second, it depends on whether speech is heard under optimal or adverse conditions. Mattys et  al. (2012, p. 953) defined an adverse condition as “any factor leading to a decrease in speech intelligibility on a given task relative to the level of intelligibility when the same task is performed in optimal listening conditions”. Mattys et  al. (2009) identified two major types of adverse conditions. First, there is energetic masking: distracting sounds cause the intelligibility of target words to be degraded. Energetic masking, which mostly affects bottom-up processing, is a serious problem in everyday life (e.g., several people talking at once; noise of traffic). Until recently, most laboratory research on speech perception lacked ecological validity (see Glossary) because listeners were rarely confronted by distracting sounds. Second, there is informational masking: cognitive load (e.g., performing a second task while listening to speech) makes speech perception harder. Informational masking mainly affects top-down processing. For example, Mitterer and Mattys (2017) found speech perception was impaired by cognitive load even when the second task was visual in nature (face processing). Alain et  al. (2018) found listeners use different processes depending on why speech perception is difficult. They conducted a meta-analysis (see Glossary) of three types of studies: (1) speech in noise; (2) degraded speech; and (3) complexity of the linguistic input. Their key finding was that patterns of brain activation varied across these three types of studies.

Problems with the speech signal Here are some specific problems with the speech signal often faced by listeners: (1) There is segmentation, which involves separating out or distinguishing phonemes (units of sound) and words from the pattern of

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 408

28/02/20 4:11 PM



409

Speech perception and reading

speech sounds. Most speech has few periods of silence, as you have probably noticed when listening to someone speaking an unfamiliar foreign language. This makes it hard to decide when one word ends and the next begins. (2) There is coarticulation: a speaker’s pronunciation of a phoneme depends on the preceding and following phonemes. Harley (2010, p. 148) provides an example: “The /b/ phonemes in ‘bill’, ‘ball’, ‘able’, and ‘rub’ are all acoustically slightly different.” Coarticulation is problematical because it increases the variability of the speech signal. However, it can provide a useful cue, because it allows listeners to predict the next phoneme to some extent. (3) Speakers differ in several ways (e.g., sex; dialect; speaking rate) and yet we generally cope well with such variability. Kriengwatana et  al.  (2016) trained Dutch and Australian-English listeners to ­discriminate two Dutch vowels from a single speaker. Both groups ­subsequently successfully categorised the same vowels when spoken by a speaker of the opposite sex. However, both groups performed poorly and required feedback when the vowels were spoken by someone with a different accent. Thus, adapting to a different-sexed speaker is r­elatively “automatic” but adapting to a different accent requires active processing of additional information (e.g., feedback; context). Expectations are important (Magnuson & Nusbaum, 2007). Some listeners expected to hear two speakers with similar voices whereas others expected to hear only one speaker. In fact, there was only one speaker. Those expecting two speakers showed worse listening performance. (4) Language is spoken at 10 phonemes (basic speech sounds) per second and much acoustic information is lost within 50 ms (Remez et al., 2010). As a consequence, “If linguistic information is not processed rapidly, that information is lost for good” Christiansen & Chater, 2016, p. 1). (5) Non-native speakers often produce speech errors (e.g., saying words in the wrong order). Listeners cope by using top-down processes to infer what non-native speakers are trying to say (Lev-Ari, 2014; see Chapter 10).

KEY TERM Coarticulation A speaker’s production of a phoneme is influenced by their production of the previous sound and by preparations for the next sound.

Coping with listening problems We have seen listeners experience various problems in understanding the speech signal. How do they cope? Multiple sources of information are used flexibly depending on the immediate situation. There are bottom-up processes stemming directly from the acoustic signal. There are also top-down processes based on the listener’s past knowledge and contextual information (e.g., the speaker’s previous utterance). Below we discuss how these processes assist speech perception.

Segmentation Dividing the speech signal into its constituent words (i.e., segmentation) is crucial for listeners. Segmentation involves using several cues. Some are

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 409

28/02/20 4:11 PM

410 Language

acoustic-phonetic (e.g., coarticulation; stress) whereas others depend on the listener’s knowledge (e.g., of words) and the immediate context (Mattys et al., 2012). Segmentation is influenced by constraints on what words are possible (e.g., a stretch of speech lacking a vowel is not a possible word in English). Listeners found it hard to identify the word apple in fapple because fapple could not possibly be an English word (Morris et  al., 1997). In contrast, listeners easily detected the word apple in wuffapple because wuff could be an English word. Evidence indicating segmentation can be based on possible word constraints has been obtained in several languages. However, it does not apply to Russian, a language which has some single-consonant words lacking a vowel (Alexeeva et al., 2017). Stress is an important acoustic cue. In English, the initial syllable of most content words (e.g., nouns; verbs) is typically stressed. Strings of words without the stress on the first syllable are misperceived (e.g., “conduct ascents uphill” is perceived as “A duck descends some pill”). There are other acoustic cues. For example, there is generally more coarticulation within than between words. In addition, segments and syllables at the start and end of words are lengthened relative to those in the middle (Kim et al., 2012). Mattys et  al. (2005) identified three main categories of cues: lexical (e.g., syntax; word knowledge); segmental (e.g., coarticulation); and metrical prosody (e.g., word stress) in his hierarchical approach (see ­ Figure 9.3). When all cues are available, we prefer to use lexical cues (Tier 1). When lexical information is impoverished, we use segmental cues such as  coarticulation and allophony (one phoneme may be associated with two or more similar sounds or allophones) (Tier 2). For example, the

Figure 9.3 A hierarchical approach to speech segmentation involving three levels or tiers. The relative importance of the different types of cue is indicated by the width of the purple triangle. From Mattys et al. (2005). © American Psychological Association.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 410

28/02/20 4:11 PM



411

Speech perception and reading

phoneme /p/ is pronounced differently in pit and spit. Finally, we resort to metrical prosody cues (e.g., stress) when it is hard to use Tier 1 or 2 cues. One reason we often avoid using stress cues is because stress information can be misleading when a word’s initial syllable is not stressed (Cutler & Butterfield, 1992). Mattys (2004) found coarticulation (Tier 2) was more useful than stress (Tier 3) for identifying word boundaries when the speech signal was  intact.  In contrast, when the speech signal was impoverished and made  it hard to use Tier 1 or 2 cues, stress was more useful than coarticulation.

KEY TERM McGurk effect A mismatch between spoken and visual (lipbased) information leads listeners to perceive a sound or word involving a blending of the auditory and visual information.

Speaker variability Earlier we discussed problems listeners have when dealing with variations across speakers in accent, speaking rate, and so on. Cai et  al. (2017) proposed a model to explain how listeners cope with variability (see Figure 9.4). They assumed listeners use information provided by the speech signal to infer characteristics of the speaker (i.e., to construct a speaker model) and this influences how speech is perceived. Cai et al. (2017) tested their model using words typically having somewhat different meanings when heard in an American or English accent. For example, the American meaning of bonnet is usually hat whereas the British meaning is usually part of a car. As predicted, British listeners were more likely to interpret such words as having the American meaning when spoken in an American rather than British accent. The crucial condition involved presenting such words in a neutral accent. These words were presented in the context of other words spoken in an American or British accent. As predicted, the neutral words were more likely to be interpreted in their American meaning when the context consisted of words spoken in an American accent. Thus, the listeners’ speaker model biased their interpretations.

McGurk effect Listeners (even with intact hearing) often make extensive use of lip-­ reading when listening to speech. McGurk and MacDonald (1976) provided a striking demonstration of the McGurk effect (reviewed by Marques et  al., 2016). They prepared a videotape of someone saying “ba” repeatedly. Then the sound channel changed so there was a voice saying “ga” repeatedly in synchronisation with lip movements still indicating “ba”. Listeners reported hearing “da”, a blending of the visual and auditory information (see this on YouTube: “McGurk effect (with explanation)”. On average, the McGurk effect is strongest when the auditory input lags 100 ms behind the visual input (Ipser et  al., 2017). This probably happens because lip movements can be used predictively to anticipate the next sound to be produced. Soto-Faraco and Alsius (2009) found the McGurk effect is unexpectedly robust: listeners showed the effect even when they were aware of a temporal mismatch between the visual and auditory input (one started before the other).

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 411

28/02/20 4:11 PM

412 Language Figure 9.4 A model of spoken word comprehension. Its key assumption is that the speech signal contains information about who is speaking (indexical information) and what they are saying (lexical-semantic pathway). These two kinds of information interact during speech perception.

Indexical pathway

Lexical-semantic pathway

Lexical-semantic representations

tm od ul a

tio

n

Meaning access

BonnetUK BonnetUS

Di

al

ec

Cai et al. (2017). Reprinted with permission of Elsevier.

Wordform representations

Dialect, age, gender, ... stable over words

... /bonrt/ ... Changes over words

ion

cat

Wo rd

tifi

en

id on

ide nti fica t

rs Pe

ion

Speaker model

Auditory input

Vocal features

...

...

...

...

Top-down processes are important. The McGurk effect was stronger when the crucial word formed by blending auditory and visual input was presented in a semantically congruent (rather than incongruent) sentence (Windmann, 2004).

CONTEXT EFFECTS Context consists of relevant information not contained directly in the auditory signal currently available to listeners. There are several types of contextual information including that provided by previous input (e.g., earlier parts of a sentence) and that provided by our knowledge of language and words.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 412

28/02/20 4:11 PM



Speech perception and reading

413

IN REAL LIFE: MISHEARD LYRICS AND MISCARRIAGES OF JUSTICE We easily mishear the lyrics of songs if provided with the wrong words (or context) beforehand. The comedian Peter Kay provides hilarious examples (YouTube: “Peter Kay Misheard Lyrics”). For example, he suggested the song “My heart will go on” from the Titanic movie contains the line “I believe the hot dogs go on” instead of the actual line “I believe the heart does go on”. The effect is strong – the first author cannot stop misperceiving that line! As Liden et  al. (2016, p.  12) pointed out, song lyrics are susceptible to misperception because of “atypical pronunciation resulting in ambivalent speech signals in combination with . . . the presence of other acoustic signals (i.e., the instrumental music)”. Misleading context leading to misperceptions can have serious consequences when we consider the use of covert recordings of suspects in criminal trials. These recordings are often indistinct, and so what happens is that detectives provide a transcript of their interpretation of what was said. If this transcript is incorrect (perhaps because detectives assume the suspect is guilty), this can strongly bias what jurors believe they hear (Fraser, 2018a). Consider the real-life case of a man sentenced to a 30-year prison sentence for murder largely because of an inaccurate police transcript. Hear the crucial recording at forensictranscription.com. au/audio (the one-minute recording is under the heading “‘Assisting’ listeners to hear words that aren’t there”). What do you think the man is saying? Jurors at the trial were told what a detective claimed to hear (given at the bottom of this Box). Fraser (2018b) carried out an experiment in which listeners initially heard the recording without any context. No listeners reported hearing anything like the incriminating sentence. When primed with the detective’s transcript, however, 15% said they definitely heard that sentence and 16% said they thought they heard it. In sum, top-down effects of context can be so strong that they lead listeners to misperceive speech. Such effects can be durable. Many listeners told explicitly they were being given incorrect words for a song (visual context) subsequently misperceived the song lyrics in the absence of the misleading visual context (Beck et al., 2014a). The detective’s transcript: “At the start we made a pact.”

It is indisputable that context typically influences spoken word recognition. However, it is hard to clarify when and how context exerts its influence. Harley (2013) identified two extreme positions. According to the interactionist account, contextual information influences processing at an early stage and may influence word perception. In contrast, the autonomous account claims context has its effects late in processing. According to this account: “context cannot have an effect prior to word recognition. It can only contribute to the evaluation and integration of the output of lexical processing, not its generation” (Harley, 2013). These theoretical approaches are discussed in the next section. Evidence that context can rapidly influence spoken word recognition was reported by Brock and Nation (2014). Participants viewed a display containing four objects and then heard a sentence. Their task was to click on any object mentioned in the sentence. In the first three conditions (see below), the sentence object was not in the display, but a critical distractor object was present:

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 413

28/02/20 4:11 PM

414 Language

(1) Competitor constraining (e.g., “Alex fastened the button”; butter in display) (2)  Competitor neutral (e.g., “Alex chose the button”; butter in display) (3) Unrelated neutral (e.g., “Alex chose the button”; lettuce in display) (4)  Target neutral (e.g., “Joe chose the button”; button in display)

Figure 9.5 Gaze probability for critical objects over the first 1,000 ms since target word onset for target neutral, competitor neutral, competitor constraining and unrelated neutral conditions (described in text).

Brock and Nation (2014) recorded eye movements (see Figure 9.5). When the sentence context made the critical object improbable (condition 1), participants were far less likely to fixate it than when the sentence context was less constraining (condition 2). This difference was apparent early on and indicates sentence context has almost immediate effects on word processing. This is consistent with the interactionist account. In what follows, we discuss various context effects. These effects will be related to the interactionist and autonomous accounts.

Phonemic restoration effect

From Brock and Nelson (2014).

Warren and Warren (1970) obtained strong evidence that sentence context can influence phoneme perception in the phonemic restoration effect. Listeners heard a sentence with a missing phoneme that had been replaced with a meaningless sound (cough). The sentences used were as follows (* = missing phoneme): ●● ●● ●● ●●

KEY TERM Phonemic restoration effect The finding that listeners are unaware that a phoneme has been deleted and replaced by a non-speech sound (e.g., cough) within a sentence.

It It It It

was was was was

found found found found

that that that that

the the the the

*eel *eel *eel *eel

was was was was

on on on on

the the the the

axle. shoe. table. orange.

The perception of the crucial element in the sentence (i.e., *eel) was influenced by the sentence in which it appeared. Participants listening to the first sentence heard wheel, those listening to the second sentence heard heel, and those exposed to the third and fourth sentences heard meal and peel, respectively. The crucial auditory stimulus (i.e., *eel) was always the same so all that differed was the contextual information. What causes the phonemic restoration effect? There may be a fairly direct effect on speech processing, with the missing phoneme being processed almost as if it were present (Samuel, 2011). Alternatively, there may an indirect effect with listeners guessing the identity of the missing phoneme after basic speech processing has occurred. Leonard et al. (2016) obtained findings strongly supporting the notion of a direct effect. The noise input was followed almost immediately by activation in language areas within the left frontal cortex associated with predicting

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 414

28/02/20 4:11 PM



415

Speech perception and reading

which word had been presented. This was followed very rapidly by appropriate phonemic processing in the auditory cortex. This latter finding strongly suggests a direct effect consistent with the interactionist perspective rather than an indirect effect consistent with the autonomous approach.

Ganong effect Earlier we saw listeners often show categorical perception (see p. 405), with speech signals intermediate between two phonemes being categorised as one phoneme or the other. Ganong (1980) wondered whether categorical perception of phonemes would be influenced by the immediate context. Accordingly, he presented listeners with various sounds ranging between a word (e.g., dash) and a non-word (e.g., tash). There was a context effect – an ambiguous initial phoneme was more likely to be assigned to a given phoneme category when it produced a word. This is the Ganong effect. In order to understand the processes underlying the Ganong effect, it is important to ascertain when lexical (word-based) processing influences phonemic processing. Kingston et al. (2016) obtained clear evidence on this issue. Listeners categorised phonemes by choosing between two visually presented options (one completing a word and the other not). Listeners directed their eye movements to the word-completing option almost immediately. This finding strongly suggests there is a remarkably rapid merging of phonemic and lexical processing. This seems inconsistent with the notion that phonemic processing is completed prior to the use of wordbased processing.

KEY TERM Ganong effect The finding that perception of an ambiguous phoneme is biased towards a sound that produces a word rather than a non-word.

Interactionist vs autonomous accounts So far we have considered how contextual information is used with respect to specific phenomena (phonemic restoration effect; Ganong effect). More broadly, we can consider the role of prediction in spoken word recognition. Predictive influences should generally occur more rapidly on interactionist than autonomous accounts (see Kuperberg and Jaeger, 2016, for a review). Van Berkum et  al. (2005) presented sentences in Dutch such as the following: The burglar had no trouble locating the secret family safe. Of course, it was situated behind a . . . It is reasonable to predict the following noun will be painting, which has the neuter gender in Dutch. The word a was followed by the Dutch adjective big in the neuter gender or common gender. Event-related potentials (ERPs; see Glossary) to the adjective differed depending on whether its gender was consistent with the predicted noun (i.e., painting). Thus, story context can influence speech processing before the predicted word is presented. In similar fashion, Grisoni et  al. (2017) presented spoken sentences in which the final word was relatively easy to predict (e.g., “I take the pen and I [write]”; “I take some grapes and I [eat]”). Patterns of brain activity reflected the meaning of the predicted final word before it was presented. More specifically, brain areas associated with hand-related actions were

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 415

28/02/20 4:11 PM

416 Language

activated prior to a semantically relevant final word (e.g., write) and those associated with face-related actions were also activated prior to a semantically relevant word (e.g., eat). Further evidence supporting the interactionist position was reported by Wild et  al. (2012; see also Chapter 1). Listeners heard sentences presented in clear speech or degraded (but potentially intelligible) speech. Each sentence was accompanied by context: a text matching the spoken words or a random consonant string. The rated perceptual clarity of the sentences was greater when they were accompanied by matching text. How can we explain the above context effect? According to the interactionist position, matching context might influence the early stages of spoken word processing within primary auditory cortex. In contrast, it follows from the autonomous position that context should influence only later processing stages and so should not influence processing in the primary auditory cortex. The findings were entirely consistent with the interactionist position. In similar fashion, Sohoglu et al. (2014) found the perceived clarity of a degraded spoken word was greater when preceded by written text (context) containing that word. However, when the same contextual information was presented after a spoken word, it had very little effect on the word’s perceived clarity. This seems inconsistent with the autonomous position, according to which context has its effects late in processing. Finally, we return to Wild et  al.’s (2012) study. There was no effect of matching context on activation within primary motor cortex when sentences were presented in clear speech. Why do these findings differ from those with degraded speech? Speech perception was so straightforward with clear speech there was no need (and also insufficient time) for context to activate primary auditory cortex.

Overall evaluation Research on context effects (including the Ganong and phonemic restoration effects) mostly indicates context can influence early stages of speech perception. It is thus more supportive of the interactionist position than the autonomous one. This issue is discussed again later when we consider theories of speech perception (see pp. 417–429). There are two qualifications on the conclusion that top-down effects of context influence the early stages of speech perception. First, such effects are less likely to be found when speech is clear and unambiguous (e.g., Wild et  al., 2012). Top-down processes may often be unnecessary when bottom-up processes provide ample information for word recognition. Second, much research on context effects in spoken word r­ecognition (e.g., Grisoni et  al., 2017) has used sentence contexts in which the target word is highly predictable. Huettig and Mani (2016) argued that topdown influences may be much weaker when prediction is harder (as is often the case in real life). In addition, listeners may often not engage in top-down predictive processes because they are too demanding of resources.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 416

28/02/20 4:11 PM



417

Speech perception and reading

THEORIES OF SPEECH PERCEPTION In this section, we consider theories of processes involved in identifying spoken words and sentences. Some of these theories also explain findings on segmentation and context effects discussed earlier. As we have seen, phonological processing plays a major role in speech perception. We start by considering whether orthographic processing (processing related to word spellings) is also involved. Then we discuss major theories of spoken word recognition. First, we consider the motor theory of speech perception (e.g., Liberman et  al., 1967), followed by a discussion of the TRACE and cohort models. The TRACE model (McClelland & Elman, 1986) claims word recognition involves interactions between top-down and bottom-up processes. The original cohort model (MarslenWilson & Tyler, 1980) also emphasised such interactions. Subsequently, Marslen-Wilson (e.g., 1990) revised his cohort model to increase the emphasis on bottom-up processes driven by the speech signal.

KEY TERMS Lexical access Accessing detailed information about a given word by entering the lexicon.

Orthographic influences Suppose you listen to spoken words. Would this activate their spellings? Chiarello et  al. (2018) studied spoken word identification under difficult conditions (multi-speaker babble). The researchers computed the proportion of similar sounding words (phonological neighbours) also spelled similarly (orthographic neighbours) for each spoken word. Word identification rates were lower for words having many orthographic neighbours as well as phonological neighbours. Thus, word identification was influenced by orthography. How does orthography influence speech perception? Perhaps hearing a word leads fairly “automatically” to activation of its orthographic codes and so influences lexical access. Alternatively, a spoken word’s orthography may influence its processing only after lexical access. This issue has been addressed using event-related potentials (ERPs; see Glossary). Pattamadilok et  al. (2011) asked listeners to decide whether spoken words had a given final syllable. Orthographic information influenced ERPs at 175–250 ms, suggesting orthography affects early processing prior to lexical access (lexical access is often reflected in the N400 component of the ERP occurring 400 ms after word onset). In similar fashion, Kwon et al. (2016) found, with the Korean language, that orthographic information influenced the P200 component of the ERP (occurring between 150 and 300 ms) on a spoken word recognition task. Finally, Pattamadilok et  al. (2011) reviewed research indicating that orthographic information often influences word processing 300–350 ms after word onset. Such findings suggest orthographic information can influence various stages of word processing.

Motor theory Liberman et  al. (1967) argued that a key issue is explaining how listeners perceive spoken words accurately even though the speech signal is variable. In their motor theory, they proposed listeners mimic the speaker’s

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 417

28/02/20 4:11 PM

418 Language

articulatory movements. It was claimed this motor signal provides much less variable and inconsistent information about the speaker’s words than does the speech signal and so facilitates speech perception. Halle and Stevens (1962) also emphasised the role of speech-­ production processes in speech perception. They proposed an analysis-by-­ synthesis approach where “Cues from the input signal triggered guesses about the identity of phonemes [using speech-production processes], and subsequently, the internal synthesis of potential phonemes is compared to the input sequence” (Poeppel & Monahan, 2011, p. 2). Thus, speech-­ production processes predict the speech input and enhance speech perception when the speech signal is ambiguous. Much research has assumed there is a single motor speech system. This is a drastic oversimplification. There are actually several motor systems or networks, but their precise number and nature remain unclear (Skipper et al., 2017).

Findings According to motor theories, the brain areas activated during speech perception and speech production should overlap substantially, whereas such overlap should be limited if speech-production processes are not involved in speech perception. Skipper et  al. (2017) reported a meta-analysis comparing activation during speech perception and production. Several areas were common to speech perception and speech production, including the pars opercularis, ventral central sulcus, ventral precentral sulcus and gyrus, supplementary motor area, and anterior insula. The above meta-analysis was concerned only with perception and production of words and non-words. Silbert et  al. (2014) reported a more naturalistic study assessing brain areas activated during production and perception of a 15-minute spoken narrative. They divided brain areas activated during both perception and production into those where activity was correlated or coupled over time and those where it was not coupled. An absence of coupling may mean a given area is used for different functions during speech perception and speech production. What did Silbert et  al. (2014) find? Several brain areas exhibited perception–production coupling (see Figure 11.1 in Chapter 11). These ­ areas included the superior temporal gyrus, the medial temporal gyrus, the temporal pole, the angular gyrus, the inferior temporal gyrus, the insula and the premotor cortex. The above neuroimaging research is inconclusive because listeners may use speech-production processes only following speech perception. More direct evidence can be obtained by applying transcranial magnetic stimulation (TMS; see Glossary) to part of the speech-production system during a speech-perception task to influence its functioning. Liebenthal and Möttönen (2018) concluded their review of TMS research as follows: “Disruptions in the articulatory motor areas impair speech perception and modulate early . . . processing of speech sounds in the auditory areas” (p. 38). Thus, processes in speech-production areas can causally influence speech perception. Additional support for motor theories comes from studies using event-related potentials (ERPs; see Glossary). The key finding is that

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 418

28/02/20 4:11 PM



Speech perception and reading

419

articulatory motor areas are activated early in speech processing (under 100 ms) (Liebenthal & Möttönen, 2018). Thus, speech-production processes occur early enough to influence speech perception. Listeners sometimes make more use of speech-production processes when the speech input is unclear and provides insufficient auditory information. For example, Nuttall et al. (2016) found listeners had greater activation in the motor cortex when speech perception was made harder (e.g., presented with background noise). However, they used rather artificial speech stimuli (i.e., syllables). In contrast, Panouillères et  al. (2018) presented more naturalistic sentences. Speech-production areas were involved in speech processing to the same extent regardless of the extent to which noise reduced the clarity of the speech input. Evidence from brain-damaged patients might clarify the role of motor processes in speech perception. If patients whose motor cortex is destroyed can still perceive speech, we might conclude motor processes are unnecessary for speech perception. This approach is simplistic, because many different brain areas are involved in speech production (see Figure 11.1). However, the overall picture is clear: “Speech perception deteriorates with a wide range of damage to speech-production systems caused by stroke, focal excitation for epilepsy, cerebral palsy, and Parkinson’s disease” (Skipper et al., 2017, p. 95). Finally, we consider an important study by Uddin et  al. (2018). They focused on how long a target needed to be presented for listeners to recognise it when presented in isolation or within a relevant sentence context. The target was a noun that could be represented by a word (e.g., sheep) or by a sound (e.g., a sheep bleating). What would we expect to find according to the motor-theory approach? First, there should be a beneficial effect of sentence context on speed of target identification because context facilitates prediction of the final sound. Second, and more importantly, the context effect should be much greater when the target is a word. Why is that? As Uddin et  al. pointed out, “It is not possible to make neural predictions via motor systems [for] environmental sounds [that] do not have clear speech . . . representations” (p. 140). However, the context effect was as great with the environmental sounds as the words (see Figure 9.6). Thus, listeners make predictions at the level of conceptual meaning (i.e., predicting the meaning that will be represented by the target sound rather than the sound itself).

Evaluation As Skipper et  al. (2017, p. 97) concluded their review, “Brain regions and networks involved in speech production are ubiquitously involved in speech perception.” This conclusion is supported by several kinds of evidence: (1) neuroimaging evidence for overlapping brain areas for speech perception and speech production; (2) rapid activation of motor areas during speech perception revealed by research using event-related potentials;

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 419

28/02/20 4:11 PM

420 Language Figure 9.6 Mean target duration required for target recognition for words and sounds presented in isolation or within a general sentence context.

Target duration (sec)

0.4

From Uddin et al. (2018). Reprinted with permission of Elsevier.

0.3 Sentence ending Sound Word

0.2

0.1

0.0 Isolated

General

Context

(3) impaired speech perception following damage to speech-production systems; (4) adverse effects on speech perception of transcranial magnetic stimu­ lation applied to speech-production areas on speech perception. What are the limitations of motor theories? First, Uddin et  al.’s (2018) findings suggest listeners do not simply predict the sounds that will be presented. Instead, most theories of speech perception (including motor theories) “should be modified to include a larger contribution from general cognitive processes that take conceptual meaning into account” (Uddin et al., 2018, p. 141). Second, the available evidence suggests, “Multiple speech production-­ related networks and sub-networks dynamically self-organise to constrain  interpretations of indeterminate acoustic patterns as listening context requires” (Skipper et  al., 2017, p. 77). No theory explains these complexities. Third, many brain areas are involved in speech perception but not speech production (see Figure 11.1). Thus, motor theories would need development to provide comprehensive accounts of speech perception. Fourth, when speech input is clear, comprehension can be achieved with minimal involvement of speech-production processes. That may limit the applicability of motor theories to speech perception in typical conditions.

TRACE model McClelland and Elman (1986) proposed a network model of speech ­perception based on connectionist principles (see Chapter 1). Their TRACE model assumes bottom-up and top-down processes interact fl ­ exibly in spoken word recognition. It makes the following assumptions (see Figure 9.7): ●●

There are individual processing units or nodes at three different levels: features (e.g., voicing; manner of production); phonemes; and words.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 420

28/02/20 4:11 PM

●●

●●

●●

●●

●●

●●

●●

Speech perception and reading

421

Feature nodes are connected to phoneme nodes, and phoneme nodes are connected to word nodes. Connections between levels operate in both directions and are always facilitatory. There are connections among units or nodes at the same level; these connections are inhibitory. Nodes influence each other in proportion to their activation levels and the strengths of their interconnections. As excitation and inhibition spread among nodes, a pattern of activation or trace develops. All activated words are involved in a competitive process in which these words inhibit each other. The word with the strongest activation wins the competition. “Words are recognised incrementally by slowly ramping up the activation of the correct words at the phoneme and word levels” (Joanisse & McClelland, 2015, p. 237).

The TRACE model assumes bottom-up and top-down processes interact. Bottom-up activation proceeds upwards from the feature level to the phoneme level and on to the word level. In contrast, top-down activation proceeds in the opposite direction from the word level to the phoneme level and on to the feature level.

Findings Suppose participants hear the word beaker. In front of them is a visual display containing four objects’ drawings showing a beaker, beetle, speaker and carriage. Eye tracking is used to identify which drawing is being fixated. As Joanisse and McClelland (2015) pointed out, we can make several

Figure 9.7 The basic TRACE model, showing how activation between the three levels (word, phoneme and feature) is influenced by bottom-up and top-down processing.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 421

28/02/20 4:11 PM

422 Language

predictions from the model assuming there are close links between eye fixations and the activation levels of the words fixated: (1) The object corresponding to the spoken word (i.e., beaker) should receive the most fixations. (2) Phonological competitors (beaker; speaker) should receive more fixations than an unrelated competitor (carriage) because of phonemic processing. (3) A phonological competitor sharing its first phoneme (beetle) with the spoken word should receive more fixations than a phonological competitor sharing its last phoneme (speaker) with the spoken word. Allopenna et al. (1998) carried out a study along the lines indicated above. Their findings are shown in Figure 9.8. As you can see, there was a reasonably close fit between the behavioural data and predictions following from the model. Suppose we asked listeners to detect target phonemes presented in words and non-words. According to the TRACE model, performance should be better in the word condition. Why is that? In that condition, activation from the word level to the phoneme level would facilitate phoneme detection. Mirman et al. (2008) required listeners to detect a target phoneme (/t/ or /k/) in words and non-words. Words were presented on 80% or 20% of trials. It was assumed attention to (and activation at) the word level would be greater when most auditory stimuli were words, which would increase the performance advantage in the word condition. What did Mirman et  al. (2008) find? First, there was a consistent advantage for the word conditions over the non-word conditions (see Figure 9.9). Second, the magnitude of this effect was greater when 80% of the auditory stimuli were words. These findings indicate the involvement of top-down processes in speech perception. The TRACE model explains the basic Ganong effect (discussed earlier, p. 415) where there is a bias towards perceiving an ambiguous phoneme so a word is formed. It is assumed within the TRACE model that top-down

1.0

Activation in TRACE

0.8

TRACE model

1.0

Referent (e.g., “beaker”) Cohort (e.g., “beetle”) Rhyme (e.g., “speaker”) Unrelated (e.g., “carriage”)

0.9

Figure 9.8 (a) Actual eye fixations on the object corresponding to a spoken word or related to it; (b) predicted eye fixations from the TRACE model.

(b)

Behavioral data

Referent (e.g., “beaker”) Cohort (e.g., “beetle”) Rhyme (e.g., “speaker”) Unrelated (e.g., “carriage”)

0.9 Predicted fixation probability

(a)

0.7 0.6 0.5 0.4 0.3 0.2 0.1

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1

0.0

0.0 0

10

20

30

40

50

Cycle

60

70

80

80

0

200

400

600

800

1000

Time since target onset (scaled to msec)

From Allopenna et al. (1998).

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 422

28/02/20 4:11 PM



Speech perception and reading

423

activation from the word level is responsible. Norris et  al. (2003) reported additional evidence that phoneme identification can be influenced directly by top-down processing. Listeners categorised ambiguous phonemes as /f/ or /s/. Those who had previously heard this phoneme in /f/-ending words favoured the /f/ categorisation, whereas those who had heard it in /s/-ending words favoured the /s/ categorisation. The TRACE model explains categorical speech perception (discussed earlier) by assuming the boundary between phonemes becomes sharper because of mutual inhibition between phoneme units. These inhibitory processes produce a “winner takes all” situation with one phoneme becoming increasingly Figure 9.9 more activated than other phonemes, thus Mean reaction times (in ms) for recognition of /t/ and /k/ phonemes in words and non-words when words were producing categorical perception. High-frequency words (those encountered presented on a high (80%) or low (20%) proportion of trials. frequently) are generally recognised faster From Mirman et al. (2008). Reprinted with permission of the Cognitive Science Society Inc. than low-frequency ones (Harley, 2013). It would be consistent with the TRACE model’s approach to assume this finding occurs because high-frequency words have higher resting activation levels. If so, word frequency should influence even early stages of word processing. Dufour et  al. (2013) obtained supporting evidence. Word frequency influenced event-related potentials as early as 350 ms after word onset during spoken word recognition. We turn now to problematical findings for the model. It assumes topdown influences originate at the word level. Thus, top-down effects (e.g., produced by relevant context) should benefit target identification more when the target is a word (e.g., sheep) rather than an environmental sound (e.g., a sheep bleating). However, context effects are as great with environmental sounds as with words (Uddin et al., 2018; see Figure 9.6), suggesting top-down processing activates general conceptual meanings rather than specific words. Frauenfelder et  al. (1990) asked listeners to detect a given phoneme. In the key condition, a non-word closely resembling an actual word was presented (e.g., vocabutaire instead of vocabulaire). The model predicts top-down effects from the word node corresponding to vocabulaire should have impaired the task of identifying the t in vocabutaire. However, they did not. McQueen (1991) asked listeners to categorise ambiguous phonemes at the end of auditory stimuli. Each ambiguous phoneme could be interpreted as completing a word or non-word. The TRACE model predicts listeners should have shown a preference for perceiving the phonemes as completing words. This prediction was confirmed when the stimulus was degraded but not when it was not degraded. The TRACE model ignores the role of context provided by verbs in influencing spoken word recognition. Rohde and Ettlinger (2012) presented

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 423

28/02/20 4:11 PM

424 Language

listeners with sentences such as the following ( ___ indicates an ambiguous phoneme interpretable as he or she): (1) Abigail annoyed Bruce because ___ was in a bad mood. (2) Luis reproached Heidi because ___ was getting grouchy. They predicted (and found) that listeners would hear the ambiguous phoneme as she in both sentences. Annoyed is typically followed by a pronoun referring to the subject, whereas reproached is followed by a pronoun referring to the object. Zhang and Samuel (2018) investigated the effects of cognitive load (in the form of a phonological load) on speech perception. The effects were much greater on later processing (maintaining competing word candidates) than earlier processing (lexical access: see Glossary). Thus, early processes are more “automatic” than later ones. These findings are inconsistent with the TRACE model, which “makes no distinctions in terms of automaticity of sub-processes during speech recognition” (p. 43).

Evaluation The TRACE model has several successes to its credit. First, even though it was proposed in 1986, “The rate of citations of the original work has increased since 2001” (Joanisse & McClelland, 2015, p. 238). Second, it provides plausible accounts of phenomena such as the phonemic restoration effect, categorical perception, the Ganong effect and the word superiority effect in phoneme monitoring. Third, the TRACE model assumes bottom-up and top-down processes both contribute directly to spoken word recognition. As such, it is an excellent example of the interactionist approach (discussed earlier, pp. 415–416). Fourth, the TRACE model “copes extremely well with noisy input  – which is a considerable advantage given the noise present in natural language” (Harley, 2013). It does so through its emphasis on top-down processes that become increasingly important when the speech input is degraded and so provides only limited information. What are the model’s limitations? First, its focus is rather narrow, being on word recognition, and it has little to say about speech comprehension. Second, the model assumes top-down processes influence the activation of specific words. However, Uddin et  al.’s (2018) findings indicate top-down processes can initially activate higher-level conceptual meanings rather than specific words. Thus, the model would be enhanced by adding a conceptual meaning level above the word level (see Figure 9.7). Third, the model sometimes exaggerates the importance of top-down effects on speech perception. More specifically, the model predicts topdown activation from the word level will cause mispronunciations and ambiguous sounds to be identified as words more often than actually happens (Frauenfelder et al., 1990; McQueen, 1991). Fourth, the TRACE model incorporates many different theoretical assumptions. This may make the model “too powerful, in that it can accommodate any result” (Harley, 2013).



425

Speech perception and reading

Fifth, the model is incomplete in various ways. For example, it ignores the impact of orthographic information on speech perception. It also cannot account for the differential effects of cognitive load on early and late speech-perception processes. Sixth, Gwilliams et  al. (2018) found that the primary auditory cortex was sensitive to ambiguity in a word’s initial phoneme only 50 ms after word onset. None of the assumptions incorporated within the TRACE model provide an explanation of this rapid effect.

KEY TERM Uniqueness point The point in time in spoken word recognition at which the available perceptual information is consistent with only one word.

Cohort model The cohort model focuses on the processes involved during spoken word recognition. It differs from the TRACE model in focusing more on ­bottom-up processes and less on top-down ones. Several versions have been proposed, starting with Marslen-Wilson and Tyler (1980). Here are the main assumptions of the original version: ●●

●●

●●

Early in the auditory presentation of a word, all words conforming to the sound sequence heard so far become active: this is the word-initial cohort. There is competition among these words to be selected. Words within the cohort are eliminated if they cease to match further information from the presented word or because they are inconsistent with the semantic or other context. For example, crocodile and crockery might both belong to the initial cohort with the latter word being excluded when the sound /d/ is heard. Processing continues until information from the word itself and contextual information permit elimination of all but one of the cohort words. The uniqueness point is the point at which only one word is consistent with the acoustic signal.

How do later versions of the cohort model differ from the original version? In the original model, it was assumed any word was in or out of the cohort at a given moment. This assumption is too extreme. In revised versions of the model (e.g., Marslen-Wilson, 1987, 1990), it is assumed words vary in their level of activation and so membership of the word cohort is a matter of degree. Marslen-Wilson also assumed the word-­ initial cohort may contain words having similar initial phonemes to the presented word rather than consisting only of words having the same initial phoneme. In the original version of the model, it was assumed words not matching the context (e.g., preceding words) drop out of the word cohort. As Marslen-Wilson (1987) pointed out, this assumption is too extreme. For example, suppose you heard the sentence, “John slept the guitar”, in which the word guitar is totally inappropriate in the sentence context. However, it was nearly always accurately perceived reasonably rapidly (320 ms on average). In the revised version, it is assumed context-inappropriate words are eliminated later in processing (see below). Three processing stages are identified within the cohort model (Marslen-Wilson, 1987):

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 425

28/02/20 4:11 PM

426 Language

(1) access stage during which a word cohort is activated; (2) selection stage during which one word is chosen from the cohort; (3) integration stage during which the word’s semantic and syntactic (grammatical) properties are integrated within the sentence. According to the model’s original version, context influences the selection process. In the revised version, in contrast, “Context plays no role in the processes of access and selection” (Marslen-Wilson, 1987, p. 71). The assumptions of the revised model are more flexible than the original ones. As we will see shortly, they predict processes in spoken word recognition more accurately. Finally, Gaskell and Marslen-Wilson (2002) proposed another variant of the cohort model. Its central assumption was that there is “continuous integration” of information from the speech input and context. If the speech input is degraded or the context is strongly predictive, top-down processes relating to prediction of the next word are likely to dominate within this continuous integration. In contrast, bottom-up processes triggered by the speech signal are dominant within continuous integration if the speech signal is unambiguous and there is no constraining context.

Findings Evidence the initial phoneme of a spoken word is often especially important was reported by Allopenna et al. (1998; discussed earlier, p. 422). For example, when listeners heard the word beaker, the competition from a word starting with the same phoneme (e.g., beetle) was greater than from a rhyming competitor having the same last phoneme (e.g., speaker). McQueen and Huettig (2012) replicated this finding. According to the original version of the model (Marslen-Wilson, 1987), spoken words would not be recognised if their initial phoneme was unclear or ambiguous. Contrary evidence was reported by Frauenfelder et  al. (2001). French-speaking listeners activated words even when the initial phoneme of spoken words was distorted (e.g., hearing focabulaire activated the word vocabulaire). However, the listeners took some time to overcome the effects of the mismatch in the initial phoneme. We now consider evidence that spoken words are identified when their uniqueness point (see p. 425) is reached. O’Rourke and Holcomb (2012) presented words with an early uniqueness point (mean of 427 ms after onset) and those with a late uniqueness point (mean of 533 ms after onset). They used event-related potentials and focused on the N400 component. The N400 (reflecting access to word meaning) occurred 100 ms earlier for words having an early uniqueness point. Kocagoncu et  al. (2017) used magneto-encephalography (MEG; see Glossary) while presenting spoken words with varying uniqueness points. As predicted, each word’s uniqueness point was associated with increased semantic processing of that word plus a marked reduction in lexical and semantic processing of competitor words. The latter finding was predicted because all competitor words have been eliminated from the word cohort when the uniqueness point is reached.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 426

28/02/20 4:11 PM



Speech perception and reading

427

Access to word meaning sometimes occurs prior to the uniqueness point if the preceding context is very constraining. Van Petten et al. (1999) presented listeners with sentence frames (e.g., Sir Lancelot spared the man’s life when he begged for ____ ) followed by a contextually congruent (e.g., mercy) or incongruent (e.g., mermaid) word. There were differences in the N400 to contextually congruent and incongruent words 200 ms before the uniqueness point. However, as Nieuwland (2019) pointed out, word recognition prior to the uniqueness point probably occurs only in those rare situations where a spoken word is very predictable within its context. How does context influence word-recognition processes? According to the revised version of the model, context influences only the later stages of word recognition. Zwitserlood (1989) supported this assumption. Listeners performed a lexical decision task (deciding whether visually presented letter strings were words) immediately after hearing part of a word. When only cap ___ had been presented, it was consistent with captain and capital. Lexical decisions were faster when the presented word was related in meaning to either word (e.g., ship; money). In another condition, the part word was preceded by a biasing context (e.g., With dampened spirits the men stood around the grave. They mourned the loss of their cap ___ ). As predicted, such context did not prevent activation of the word capital even though it was inconsistent with the context. In a similar study, Friedrich and Kotz (2007) presented sentences ending with incomplete words (e.g. To light up the dark she needed her can  ___ ). Immediately afterwards, listeners saw a visual word matched to the incomplete word in form and meaning (e.g., candle), in meaning only (e.g., lantern), in form only (e.g., candy) or in neither (e.g., number). As predicted by the cohort model, the word candy was activated even though it was inconsistent with the context. However, Weber and Crocker (2012) found context can sometimes exert a very early influence on speech processing. Listeners heard German sentences (e.g., The woman irons the ____. Bluse (German for blouse) is a likely final word whereas the similar-sounding word Blume (meaning flower) is implausible. Weber and Crocker studied eye fixations to pictures of the target word (e.g., Bluse), a similar-sounding word (e.g., Blume), and an irrelevant distractor (e.g., Wolke meaning cloud). Context had a powerful effect. More fixations were directed at the target object than the other objects before the final word was presented and this tendency increased during and after its presentation (see Figure 9.10). However, similar-sounding words were fixated more than irrelevant distractors shortly after the final word in the sentence was presented. Thus, as predicted by the cohort model, words phonologically related to a spoken word were activated even when inconsistent with the context. Finally, we consider the notion of “continuous integration” (Gaskell & Marslen-Wilson, 2002). As predicted by this approach, context often influences the early stages of word processing via top-down processes (see above, pp. 412–416). Theoretically, the extent of such contextual effects should also depend in part on bottom-up influences from the to-be-­ recognised word itself. Supporting evidence for the above predictions was reported by Strand et  al. (2018). Participants received a grammatically constraining context

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 427

28/02/20 4:11 PM

428 Language Figure 9.10 Fixation proportions to high-frequency target words, high-frequency competitors that are phonologically similar to target words, and unrelated distractor words during the first 1,000 ms after target onset. From Weber and Crocker (2012). With kind permission from Springer Science+Business Media.

Figure 9.11 A sample display showing two nouns (“bench” and “rug”) and two verbs (“pray” and “run”). From Strand et al. (2018).

(e.g., “They thought about the ___) or an unconstraining context (e.g., “The word is ___) accompanied by a visual display (see Figure 9.11). Suppose the target word is rug. The constraining context implies the target should be a noun. As predicted, top-down processes led to faster target fixation with a

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 428

28/02/20 4:11 PM



Speech perception and reading

429

constraining context. In addition, the phonologically similar distractor (i.e., run) was not fixated more than the phonologically dissimilar distractors (i.e., bench; pray). However, the word run was fixated more than other distractors when pronounced to sound more similar to the target. What is the take-home message from the above findings? As predicted by Gaskell and Marslen-Wilson’s (2002) approach, “Listeners make use of contextual constraints very early in word processing while remaining sensitive to bottom-up acoustic input as words unfold” (Strand et  al., 2018, p. 969).

Evaluation The cohort model has several strengths. First, the assumption that accurate perception of a spoken word is typically accompanied by some processing of several competitor words is generally correct. Second, the processing of spoken words is sequential and changes considerably during the course of their presentation. Third, the uniqueness point is of great importance in spoken word recognition. Fourth, context effects often (but not always) occur during the integration stage following word identification as predicted by the model. Fifth, the revised versions of the model are superior to the original version. For example, the assumption that membership of the word cohort is a matter of degree rather than all-or-none is more in line with the evidence. What are the model’s limitations? First, context sometimes influences word processing earlier than the integration stage. This is especially the case when the context is strongly predictive (e.g., Grisoni et al., 2017; discussed earlier, p. 415) or the speech input is degraded (e.g., Wild et  al., 2012; discussed earlier, p. 416). However, Gaskell and Marslen-Wilson’s (2002) more flexible approach based on continuous integration can accommodate these (and many other) findings. Second, the revised cohort model de-emphasises the role of word meaning in spoken word recognition. One aspect of word meaning is imageability (ease of forming an image of a word’s referent). When there are many words in the word cohort, high-imageability words are easier to recognise than low-imageability ones (Tyler et al., 2000) and they are associated with greater activation in brain areas involved in speech perception (Zhuang et al., 2011). Thus, word selection depends on semantic factors as well as phonological ones. Third, mechanisms involved in spoken word recognition may differ from those emphasised within the model. More specifically, predictive coding and enhanced processing of speech features inconsistent with prediction may be more important than assumed within the cohort model.

COGNITIVE NEUROPSYCHOLOGY So far we have focused on the processes used by healthy listeners to recognise spoken words. Here we consider how research on brain-damaged

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 429

28/02/20 4:11 PM

430 Language

KEY TERM Pure word deafness A condition involving severely impaired speech perception but intact speech production, reading, writing, and perception of non-speech sounds.

patients has clarified processes involved in speech perception. Our focus will be on repeating spoken words immediately after hearing them. We will use the theoretical framework proposed by Ellis and Young (1988; see Figure 9.12). There are five components: ●●

●●

●● ●● ●●

The auditory analysis system extracts phonemes or other sounds from the speech wave. The auditory input lexicon contains information about spoken words known to the listener but not about their meaning. Word meanings are stored in the semantic system. The speech output lexicon provides the spoken form of words. The phoneme response buffer provides distinctive speech sounds.

The framework’s most striking assumption is that three different routes can be used when saying spoken words. We will discuss these routes after considering the auditory analysis system.

Auditory analysis system Figure 9.12 Processing and repetition of spoken words according to the three-route framework. Adapted from Ellis and Young (1988).

Consider patients with damage only to the auditory analysis system causing deficient phonemic processing. The expected consequences are found in patients with pure word deafness: “an inability to understand spoken language in the absence of any other linguistic disturbance . . . [they] are perfectly capable of speaking, writing, and reading” (Kasselimis et  al., 2017,

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 430

28/02/20 4:11 PM



431

Speech perception and reading

p. 11). Of importance, such patients should have intact perception for nonspeech sounds (e.g., whistles) not containing phonemes. Maffei et  al. (2017) studied a female patient (FO) with pure word deafness. She had a selective impairment in auditory language processing but intact processing of environmental sounds and music (e.g., identifying which musical instrument was being played). She also had intact speech, reading and writing. Unsurprisingly, FO had damage to regions of a brain network dedicated to speech sound processing. Slevc et  al. (2011) argued that speech perception differs from the perception of most non-speech sounds because listeners must cope with rapid stimulus changes. They found NL, a patient with pure word deafness, had great difficulties discriminating sounds (speech or non-speech) differing in rapid temporal changes. Thus, the rapid stimulus changes in spoken words may partially explain why patients with pure word deafness have severe speech-perception problems.

KEY TERM Word meaning deafness A condition in which there is selective impairment of the ability to understand spoken (but not written) language.

Three-route framework Ellis and Young’s (1988) framework specifies three routes that can be used when individuals process and repeat words they have just heard (see Figure 9.12). All three routes involve the auditory analysis system and the phonemic response buffer. Route 1 also involves the other three components (auditory input lexicon; semantic system; speech output lexicon). Route 2 involves two additional components (auditory input lexicon; speech output lexicon), and Route 3 involves an additional rule-based system converting acoustic information into words that can be spoken. According to the three-route framework, Routes 1 and 2 are used with unfamiliar words and non-words.

Findings Patients using predominantly Route 2 should recognise familiar words but not understand their meaning. Since they can use the input lexicon, they should distinguish between words and non-words. Finally, they should have problems saying unfamiliar words and non-words. Patients with word meaning deafness fit the above description. For example, Dr O had reasonable use of the input lexicon as shown by his excellent ability to distinguish between words and non-words (Franklin et al., 1996). O repeated words much more successfully than non-words (80% vs 7%, respectively). He had impaired auditory comprehension. However, he had intact written word comprehension indicating his semantic system was probably not damaged. BB, a female patient with word meaning deafness, could distinguish between words and non-words. She was severely impaired in identifying pictures matching spoken words but not when identifying pictures matching written words (Bormann & Weiller, 2012). Thus, BB could not access the meanings of spoken words although her semantic processing ability was intact. Patients using only Route 3 could repeat spoken words and non-words but would have very little comprehension of the words. Patients with

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 431

28/02/20 4:11 PM

432 Language

KEY TERMS Transcortical sensory aphasia A condition in which spoken words can be repeated but comprehension of spoken and written language is severely impaired. Deep dysphasia A condition involving semantic errors when trying to repeat spoken words and a generally poor ability to repeat spoken words and non-words.

transcortical sensory aphasia exhibit this pattern. For example, Kim et al.

(2009) studied a male patient. He repeated spoken words but had severely impaired auditory and reading comprehension. These findings suggested he had damage within the semantic system. Kwon et al. (2017) studied two patients with transcortical sensory aphasia. Their impaired auditory comprehension appeared to be due to greatly decreased functional connectivity between language centres in the brain. Patients with deep dysphasia have extensive problems with speech perception and production. They make semantic errors when repeating spoken words by producing words related in meaning to those spoken (e.g., saying sky instead of cloud). They also have very impaired ability to repeat words and non-words. Ablinger et  al. (2008) discussed findings from JR, a man with deep dysphasia. In spite of severely impaired speech perception, he was only slightly impaired at reading aloud words and non-words. We could explain deep dysphasia by arguing all three routes shown in Figure 9.12 are damaged. However, Jefferies et  al. (2007) argued plausibly that the central problem in deep dysphasia is a general phonological impairment (i.e., impaired processing of word sounds). This produces semantic errors because it increases patients’ reliance on word meanings when repeating spoken words. Jefferies et  al. (2007) found deep dysphasics had poor phonological production when repeating words, reading aloud and naming pictures. As predicted, they also performed very poorly on tasks involving manipulating phonology such as the phoneme subtraction task (e.g., remove the initial phoneme from cat). Furthermore, they showed speech perception problems (e.g., impaired performance when deciding whether words rhymed).

Evaluation The three-route framework is along the right lines. Patients have various problems with speech perception (and speech production) and evidence exists for all three routes. Conditions such as pure word deafness, word meaning deafness and transcortical sensory aphasia can readily be related to the framework. What are the limitations with this theoretical approach? First, it provides only a sketch map of the underlying mechanisms. For example, what detailed processes occur within the semantic or auditory analysis systems? Second, it is sometimes hard to relate patients’ symptoms to the framework. For example, it is debatable whether deep dysphasia involves impairments to all three routes or a general phonological impairment.

READING: INTRODUCTION Reading is a very important skill – adults lacking effective reading skills are severely disadvantaged. Thus, we need to understand the processes involved in reading. Reading requires several perceptual and other c­ ognitive ­processes plus a good knowledge of language and grammar.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 432

28/02/20 4:11 PM



433

Speech perception and reading

Figure 9.13 A general framework of the processes and structures involved in reading comprehension. For details, refer to text.

Linguistic and writing system knowledge Linguistic system

Orthographic system

phonology, syntax, morphology

mapping to phonology

From Perfetti and Stafura (2014).

Parser Text representation

Inferences

Meaning morphology syntax – argument structure – thematic roles

Meaning and form selection

Phonological units

processes

Lexicon Word identification

Visual input

Comprehension Orthographic units

Situation model

General knowledge (including text structure)

In this chapter, we focus mostly on basic processes used in reading single words. Research and theory relating to reading sentences and complete texts are discussed in Chapter 10. An overview of what is involved in reading across all these levels is shown in Figure 9.13. Here are its key features: (1) Reading requires various kinds of stored information: word meanings stored in a lexicon (mental dictionary); word spellings (orthographic knowledge); general knowledge about the world; and linguistic knowledge. (2) Readers use the above knowledge sources to produce word identification followed by text comprehension. Processes required for text comprehension include working out the syntactical or grammatical structure of each sentence (the parser), drawing inferences, and producing a situation model (an integrated mental representation). (3) The order in which reading processes occur is flexible. This is indicated in Figure 9.13 by bidirectional arrows (e.g., between the lexicon and comprehension processes). You may well feel (and you would be right!) that Figure 9.13 implies that reading involves many complex processes. However, the good news is that all aspects of the framework shown in that figure are discussed in detail in this chapter and Chapter 10.

Anglocentricities Most research on reading considers only the English language. Does this matter? Share (2008) argued strongly the “anglocentricities” of reading research are important because the relationship between orthography

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 433

28/02/20 4:11 PM

434 Language

KEY TERMS Lexical decision task Participants presented with a string of letters or auditory stimulus decide rapidly whether it forms a word. Naming task A task in which visually presented words are pronounced aloud rapidly. Orthography The study of letters and word spellings. Phonology The study of the sounds of words and parts of words. Semantics The study of the meaning conveyed by words, phrases and sentences.

(spelling) and phonology (sound) is much less consistent in English than most other languages. Caravolas et  al. (2013) found English children learned to read more slowly than children learning more consistent languages (e.g., Spanish or Czech; see Figure 9.14).

Research methods Numerous methods are available for studying reading. For example, consider ways of assessing the time taken for word identification or recognition (e.g., deciding a word is familiar; accessing its meaning). The lexical decision task involves deciding rapidly whether a letter string forms a word. The naming task involves saying a printed word out loud as rapidly as possible. Both tasks have limitations. Normal reading times are disrupted by the requirement to respond to task demands and it is hard to identify the underlying processes. Balota et  al. (1999) argued reading involves several kinds of processing: orthography (the spelling of words); phonology (the sound of words); semantics (word meaning); syntax or grammar; and higher-level discourse integration. The naming task emphasises links between orthography and phonology, whereas the lexical decision task emphasises links between orthography and semantics. Normal reading also involves processing of syntax and higher-level integration, processes irrelevant to naming or lexical decision. Recording eye movements provides an unobtrusive and detailed on-line record of attention-related processes. The main problem is deciding what processing occurs during each fixation (time period during which the eye remains still). Next there is priming (see Glossary) where a prime word is presented shortly before the target word. This prime word is related to the target word in spelling, meaning or sound. What is of interest is to observe the effects of the prime on processing of (and response to) the

Figure 9.14 Estimated reading ability over a 30-month period with initial testing at a mean age of 66 months for English, Spanish and Czech children. From Caravolas et al. (2013). Reprinted by permission of SAGE Publications.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 434

28/02/20 4:11 PM



435

Speech perception and reading

target word. For example, when reading clip, do you access information about its pronunciation? The answer is “Yes”. A word preceded by a non-word having identical pronunciation (klip) presented below the level of conscious awareness is processed faster (Rastle & Brysbaert, 2006; see below, p. 436). Finally, there has been a dramatic increase in reading research using event-related potentials. ERPs provide a precise measure of the time taken for certain processes to occur. For example, consider the N400, a negative wave peaking at about 400 ms after word onset. It has been assumed to reflect the time taken to access word meaning. More specifically, a large N400 often indicates a change in the meaning assigned to a word (Rabovsky et al., 2018; see Chapter 10).

KEY TERM Homophones Words pronounced in the same way but that differ in their spellings (e.g., pain/ pane; sale/sail).

Phonological processes You are currently reading this sentence. Did you access the relevant sounds when identifying the words in it? More technically, did you engage in phonological processing of the words? We guess your answer is “Yes”, given that most readers experience an “inner voice” or “inner speech” during reading. For example, readers reading a text said they had engaged in inner speech immediately before being questioned on 59% of occasions (Moore & Schwitzgebel, 2018). However, subjective reports cannot demonstrate phonological processes play a causal role in the reading process. Various answers to the above questions have been proposed (Leinenger, 2014). Van Orden (1987) argued phonological processing is necessary very early in word reading because it plays a role in activating lexical entries (stored words). In contrast, Coltheart et  al. (2001) argued phonological processing is relatively slow, and mostly inessential for word identification. Why might we expect phonological processing to be important? Children often learn to read using the phonics approach, which involves forming connections between letters or groups of letters and the sounds of spoken English (Share, 2008). Children’s early phonemic skills predict (and are probably causally related to) their future word-reading skills (MelbyLervåg et al., 2012).

Case study: Phonological processes

Findings Much evidence supports the hypothesis that phonological processing is important in word reading. One approach involves the use of homophones (words with one pronunciation but two spellings). Van Orden (1987) found readers made many more errors when asked, “Is it a flower? ROWS”, than when asked, “Is it a flower? ROBS. The errors occurred because readers engaged in phonological processing of the word ROWS which is homophonic with the flower name ROSE. Jared and O’Donnell (2017) also used homophones. Eye movements were recorded while skilled adult readers read sentences such as: (1) Last night I made pasta for dinner; (2) Last night I maid pasta for dinner; and (3) Last night I mate pasta for dinner. Eye movements on incorrect sentences differed depending on whether the incorrect word was phonologically

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 435

28/02/20 4:11 PM

436 Language

KEY TERM Phonological neighbourhood Words are phonological neighbours if they differ in only one phoneme (e.g. wipe, pipe and tap are phonological neighbours of type).

identical to the correct one (e.g., sentence 2) or not (e.g., sentence 3). Thus, the readers used phonological processing. We can use phonological priming (mentioned earlier, p. 434), to assess the role of phonology in word processing. A word (e.g., clip) is immediately preceded by a phonologically identical non-word prime (e.g., klip) presented below the level of conscious awareness. Rastle and Brysbaert (2006) found in a meta-analytic review that words were processed faster when preceded by phonologically identical non-word primes than by unrelated primes. This suggests phonological processing of visually presented words occurs rapidly and automatically. Another approach involves the notion of phonological neighbourhood. Two words are phonological neighbours if they differ in only one phoneme (e.g., gate has bait and get as neighbours). If reading involves phonological processing, word recognition should be influenced by the number of its phonological neighbours. This is the case (Carrasco-Ortiz et al., 2017). Phonological processing typically occurs during reading. However, such processing is not necessarily essential (e.g., it may occur after word recognition has occurred) and may simply be a byproduct of reading. However, various types of research support the hypothesis that phonological processing causally facilitates reading. First, phonological processing often starts within 80–100 ms of the first fixation on a word (Leinenger, 2014). That would be fast enough to influence word recognition. For example, Sliwinska et  al. (2012) found trans­ cranial magnetic stimulation (TMS; see Glossary) to the supramarginal gyrus (an area associated with phonological processing) 80 ms after word onset impaired performance on a phonological task. Second, we can study profoundly deaf adult readers who initially learned a sign language (e.g., American Sign Language) and so did not learn to read by reading aloud and sounding out the letters of words. They often make extensive use of phonological processing during the early stages of visual word recognition (Gutierrez-Sigut et al., 2017). Suggestive evidence that word meaning can be accessed without access to phonology was reported by Hanley and McDonnell (1997). Their patient, PS, could not gain access to the other meaning of homophones when he saw one of the spellings (e.g., air) and could not pronounce written words accurately. However, PS provided accurate definitions of printed words suggesting he had full access to the meanings of words for which he lacked the appropriate phonology. Similar findings were obtained from a Chinese male patient, YGA (Han & Bi, 2009). In sum, phonological processing is typically involved in reading and much evidence suggests it plays a causal role. However, the findings from patients with severely impaired phonological processing suggest some caution in assuming that is invariably the case.

WORD RECOGNITION College students typically read at about 300 words per minute (200 ms per word). How long does word recognition take? It is hard to say because of imprecision about the meaning of the “word recognition”. The term can

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 436

28/02/20 4:11 PM



437

Speech perception and reading

refer to deciding a word is familiar, accessing a word’s name or accessing its meaning. As a result, estimates of the time taken for word recognition vary.

Interactive activation model McClelland and Rumelhart (1981) proposed an influential interactive activation model of visual word processing. It is a computational model involving considerable parallel processing and based on the assumption that bottom-up and top-down processes interact ­ (see Figure 9.15): ●●

●●

●●

●●

There are recognition units at three levels: the feature level at the bottom; the letter level in the middle; and the word level at the top. When a feature in a letter is detected Figure 9.15 (e.g., vertical line at the right-hand side McClelland and Rumelhart’s (1981) interactive activation model of a letter), activation goes to all letter of visual word recognition. units containing that feature (e.g., H, M, Adapted from Ellis (1984). N), and inhibition goes to all other letter units. Letters are identified at the letter level. When a letter within a word is identified, activation is sent to the word level for all four-letter word units containing that letter in that position within the word, and inhibition is sent to all other word units. Words are recognised at the word level. Activated word units increase the level of activation in the letter-level units for that word’s letters.

Findings Much research has used the following task. A letter string is presented very  briefly followed by a pattern mask and participants decide which of two letters was presented in a given position (e.g., the third letter). Task performance is better when the letter string forms a word – the word ­superiority effect. This effect is explained by assuming there are top-down processes from the word to the letter level. Suppose the word SEAT is presented and participants decide whether the third letter is A or N. If the word unit for SEAT is activated, this increases activation of A and inhibits activation of N. Sand et  al. (2016) obtained the word superiority effect when stimuli were presented in central vision. However, the effect disappeared when stimuli were presented in peripheral vision. These findings suggest topdown processes from the word level do not apply in peripheral vision. Much research has considered orthographic neighbours (the words formed by changing one of a target word’s letters. For example, stem has

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 437

KEY TERMS Word superiority effect A target letter is more readily detected in a letter string when the string forms a word than when it does not. Orthographic neighbours With reference to a target word, the number of words that can be formed by changing one of its letters.

28/02/20 4:11 PM

438 Language

several orthographic neighbours (e.g., seem, step, stew). When a word is presented, its orthographic neighbours are activated and influence its recognition time. Orthographic neighbours facilitate word recognition if they are less common than the word itself but have an inhibitory effect if they are more common. Chen and Mirman (2012) developed a computational model based on the interactive activation model’s assumptions (especially that common words are activated more than uncommon ones) to predict these findings. The model assumes each letter within a word is rigidly assigned to a specific position. As a consequence, “WROD is no more like WORD than is WXYD” (Norris & Kinoshita, 2012, p. 517). It follows that readers should have great problems reading the “Cambridge email: Aoccrdnig to a rscheearch at Cmabrigde Uinervtisy it deosn’t mttaer in waht oredr the ltteers in a wrod are. The olny iprmoatnt tihng is that the frist and lsat ltteer be at the rghit pclae. The rset can be a toatl mses and you can still raed it wouthit porbelm. Tihs is bcusease the huamn mnid deos not raed ervey lteter by istlef but the wrod as a wlohe. In fact, most readers find it easy to read the Cambridge email even though numerous letters are transposed (Norris & Kinoshita, 2012). In the original research on this topic, however, transpositions involving the ending letters of words slowed reading rate by 26% (Rayner et al., 2006).

Evaluation The interactive activation model was an early (and extremely influential) example of how a connectionist model (see Chapter 1) could explain visual  word processing. There is considerable support for its central assumption that “Participants in language-processing tasks use all the available information and start to show sensitivity to it within a third of a second” (McClelland et  al., 2014, p. 1179). Within the model, this involves readers simultaneously using top-down and bottom-up ­processes. The model accounts for the word superiority effect and a revised version accounts for the effects of orthographic neighbours on word recognition. What are the model’s limitations? (1) The model is narrow in that it ignores the role of meaning in visual word recognition. It also ignores “how recognition processes may be influenced by surrounding words and contexts” (Snell et  al., 2018, p. 969). (2) Phonological processing is often involved in word recognition (e.g., Jared & O’Donnell, 2017; discussed earlier, pp. 435–436), but that is not considered within the model. (3) The model exaggerates readers’ focus on the precise positions of letters within words. As Grainger (2018, p. 341) pointed out, “[Many] empirical findings . . . point to the need for a more flexible letter-­ position coding scheme.” For example, readers should struggle

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 438

28/02/20 4:11 PM



to read the sentence howcanwereadwithoutspaces? because it lacks precise information about where words start and end. In similar fashion, the model cannot explain why the Cambridge email is easy to read. (4) The model’s accounts for the processing of four-letter words and its applicability to word recognition for longer words is unclear.

Semantic priming Many words within most sentences are related in meaning and this facilitates word recognition. This often involves semantic priming – a word is recognised or identified more rapidly if immediately preceded by a semantically related word. For example, we decide faster that “doctor” is a word when preceded by a semantically related priming word (e.g., “nurse”) than by a semantically unrelated word (e.g., “library”) (Meyer & Schvaneveldt, 1971). Why does semantic priming occur? Perhaps the priming word “automatically” activates the stored representations of all words related to it due to massive previous learning. Alternatively, controlled processes may be involved, with a prime such as “nurse” leading readers to expect a semantically related word to follow. Neely (1977) showed both the above explanations are valid. The priming word was a category name (e.g., BIRD) followed by a letter string at 250, 400, or 700 ms. Participants decided whether the letter string (target) formed a word (lexical decision task). Participants were instructed the prime BIRD would mostly be followed by a type of bird, whereas the prime BODY would mostly be followed by part of a building. This gives us four conditions: (1) (2) (3) (4)

439

Speech perception and reading

Expected, semantically related (e.g., BIRD–robin) Expected, semantically unrelated (e.g., BODY–door) Unexpected, semantically related (e.g., BODY–heart) Unexpected, semantically unrelated (e.g., BIRD–arm)

It was assumed “automatic” facilitatory processes would be activated if the target were semantically related to the prime but not if it were semantically unrelated. In contrast, controlled processes might be involved if the target were expected but not if it were unexpected. Neely (1977) obtained two priming or context effects (see Figure 9.16). First, there was a rapid, short-lived facilitatory effect based on semantic relatedness. Second, there was a slower but more long-lasting effect based on expectations, with expected target words showing facilitation and unexpected ones showing an inhibitory effect. Andrews et  al. (2017) reported additional support for “automatic” semantic priming. Skilled readers showed semantic priming even when the prime words were presented very briefly below the level of conscious awareness. However, there are issues concerning the interpretation of such findings. It is hard to assess whether there is any conscious awareness of stimuli (see Chapter 2) and the notion of “automaticity” is imprecise (see Chapter 5).

KEY TERM Semantic priming The finding that word recognition is facilitated by the prior presentation of a semantically related word.

440 Language

Sentential context effects Sentence context is used extensively during reading. Of particular importance is word predictability. This is typically assessed by a word’s Cloze Score (the proportion of participants provided with the first few words of a sentence guessing it would be the next word). Readers consistently fixate for shorter periods of time on predictable words and are more likely to skip them (Staub, 2015). These effects occur in part because there is generally more semantic priming of predictable than unpredictable words. Word predictability also influences event-­ related potentials. This is especially true of the N400 component (a negative wave peaking at about 400 ms), which is larger when a word is semantically unexpected. Van Petten and Luka (2012) reviewed the relevant ERP research and found the N400 is smaller when a word’s predictability is high within the ­sentence context. They concluded, “the N400 . . . reliably indexes the benefits of semantic context” (p. 176). Figure 9.16 The time course of inhibitory and facilitatory effects of priming as a function of whether or not the target word was related semantically to the prime, and of whether or not the target word belonged to the expected category.

Early or late effects?

How can we explain the effects of word predictability? Perhaps anticipatory processing Data from Neely (1977). © American Psychological Association. triggered by sentence context allows readers to process predictable words faster than unpredictable ones. Alternatively, it may simply be easier to integrate predict­able words with the preceding context. According to this latter explanation, readers would use contextual information only after accessing the word’s meaning. Evidence suggesting readers can use anticipatory processing was reported by DeLong et al. (2005). Here is a sample sentence they used: The day was breezy so the boy went outside to fly [a kite/an airplane] in the park. In this sentence, a kite is highly predictable whereas an airplane is not. There was a smaller N400 to the more predictable noun (e.g., kite) than the less predictable one (e.g., airplane). This finding was replicated in a large-scale study (Nieuwland et al., 2018). More strikingly, DeLong et  al. (2005) reported a larger N400 to the article an (preceding airplane) than to a (preceding kite). These effects on processing prior to the presentation of a predictable or unpredictable noun

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 440

28/02/20 4:11 PM



Speech perception and reading

441

suggest readers predicted in advance the most likely subsequent noun. However, this finding was not replicated (Nieuwland et al., 2018). Freunberger and Roehm (2017) measured the N400 to more predictable and less predictable nouns presented in sentences as well as the N400 to the immediately preceding adverbs. There were two key findings. First, the N400 to the noun was smaller when it was more predictable in the sentence context. More importantly, adverbs strongly predicting the following noun had larger N400s than less predictive ones. This was because strongly predictive adverbs led to increased activation of information relevant to the following noun before it was presented.

Lexical prediction vs graded prediction Luke and Christianson (2016) distinguished between two types of prediction readers might use: (1) Lexical prediction: readers activate one specific word prior to its presentation. (2) Graded prediction: readers generate more partial and general predictions (e.g., the approximate meaning of the next word; whether the next word is a noun, verb, or some other part of speech). Lexical prediction involves “putting all your eggs in one basket” – if the actual word is not the one predicted, this would probably disrupt reading. Luke and Christianson (2016) analysed 55 text passages and discovered only 21% of content words (words having meaning) and 40% of function words (words clarifying grammatical structure) in these passages were the ones most commonly guessed. Thus, most lexical predictions would be wrong. However, word predictability speeded up reading time across the entire range from very low to very high (consistent with the notion of graded prediction). Frisson et  al. (2017) compared the reading time for an unpredictable (but plausible) word in a sentence when another word was (or was not) highly predictable at that point. Reading times were comparable. Thus, there was no prediction error cost when an incorrect word was far more predictable than the one actually presented. Nieuwland (2019) reviewed relevant neuroimaging research. He concluded that the evidence for lexical prediction in reading is “weak and inconsistent” (p. 367).

Conclusions In sum, processing can be influenced at an early stage by word predictability (perhaps prior to the presentation of the target word). The evidence strongly favours graded over lexical prediction. However, lexical prediction may sometimes be used when a sentence context very strongly predicts a given word (see DeLong et al., 2005, above). The most convincing evidence for graded prediction is that it has proved hard to identify any processing costs associated with prediction errors. What is involved in graded prediction? Luke and Christianson (2016) found readers could accurately predict general characteristics of the next

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 441

28/02/20 4:11 PM

442 Language

KEY TERM Pseudowords Non-words consisting of strings of letters that can be pronounced (e.g., mantiness; fass).

word (e.g., part of speech; whether a noun will be singular or plural). Most beneficial effects of word predictability depend on predicting such general characteristics rather than predicting the word itself.

READING ALOUD Read aloud the following words and non-words (pronounceable non-words are pseudowords but we will generally use the term non-words): CAT  FOG  COMB PINT  MANTINESS  FASS You probably regarded that as a simple task although it involves hidden complexities. For example, how do you know the b in comb is silent and that pint does not rhyme with hint? Presumably you have specific information stored in long-term memory about how to pronounce these words. However, that does not explain your ability to pronounce non-words such as mantiness and fass. Perhaps non-words are pronounced by analogy with real words (e.g., fass is pronounced to rhyme with mass). Alternatively, we may use rules governing the translation of letter strings into sounds to generate pronunciations for non-words. The above description is incomplete. There are different reading disorders in brain-damaged patients depending on which parts of the language system are damaged. We turn now to two major theoretical approaches addressing these issues. First, there is the dual-route cascaded model (Coltheart et  al., 2001). Second, there is the distributed connectionist approach or triangle model (Harm & Seidenberg, 2004; Plaut et al., 1996) extended to explain reading disorders (Patterson & Lambon Ralph, 1999). The above theoretical approaches are both connectionist models (see Glossary). Why is this? The processes involved in skilled reading are complex and interactive, and computational models can handle such complexity. Of particular importance, computational models make it easier to predict what follows from various theoretical assumptions (Norris, 2013). There are key differences between the above approaches. According to the dual-route approach, reading words and non-words involves different processes. These processes are relatively neat and tidy and some are rulebased. Alas, the dual-route approach has become less neat and tidy over time! According to the connectionist triangle approach, reading processes are used more flexibly than within the dual-route model. Reading involves interactive processes – all the relevant knowledge we possess about word sounds, word spellings and word meanings is used in parallel (at the same time) whether reading words or non-words. Of importance, reading aloud involves more involvement of the semantic system within this model. The most important difference between the two approaches concerns whether the processes involved in reading are specific to reading or whether they are more general. According to the triangle approach, “Reading is, in evolutionary terms, a recently developed skill . . . underpinned by the more mature primary systems of vision, phonology, and semantics” (Hoffman et al., 2015, p. E3719). Thus, reading involves relatively general systems. In contrast, the dual-route model focuses more on reading-specific processes.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 442

28/02/20 4:11 PM



Speech perception and reading

443

We first consider each model’s major assumptions plus relevant supporting evidence. After that, we directly compare the two models on controversial issues.

Dual-route cascaded model Coltheart et  al.’s (2001) dual-route cascaded model of reading (see Figure 9.17) accounts for reading aloud and silent reading. There are two main routes between printed words and speech, both starting with orthographic analysis (used for identifying and grouping letters in words). There is a non-lexical route using grapheme-phoneme rules to convert letters into sounds (see later discussion). The identification of these rules is somewhat arbitrary and open to question (Eysenck & Brysbaert, 2018). There is also a lexical route involving lexical or dictionary look-up. In Figure 9.17, the non-lexical route is Route 1 and the lexical route is divided into two sub-routes (Routes 2 and 3) depending on whether the semantic system (meanings of words) is used.

Interactive exercise: Dual-route cascade model

Figure 9.17 Basic architecture of the dual-route cascaded model. Adapted from Coltheart et al. (2001).

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 443

28/02/20 4:11 PM

444 Language

KEY TERMS Cascade processing Later processing stages start before earlier processing stages have been completed when performing a task. Grapheme A small unit of written language corresponding to a phoneme (e.g., the ph in photo). Phonemes The smallest units of sound that distinguish one word from another and contribute to word meaning; the number and nature of phonemes varies across languages. Surface dyslexia A condition in which regular words and nonwords can be read but there is impaired ability to read irregular or exception words.

Healthy individuals use both routes in parallel when reading aloud. However, naming visually presented words typically depends mostly on the lexical route because it operates faster. It is a cascaded model because it involves cascade processing with activation at one level being passed on to the next level prior to completion of processing at the first level. Cascaded models differ from threshold models where activation at one level is only passed on to other levels after a given threshold of activation is reached. Earlier we discussed the role of phonological processing in visual word identification. Coltheart et al. (2001) argued for a weak phonological model where word identification generally does not depend on phonological processing. Coltheart et  al. (2001) produced a detailed computational model to test their dual-route cascaded model. They started with 7,981 one-syllable words and used McClelland and Rumelhart’s (1981) interactive activation model (discussed earlier, pp. 437–439), to provide the orthographic component of their model. They predicted the pronunciation most activated by processing in the lexical and non-lexical routes would determine the naming response. Coltheart et  al. (2001) found 99% of words and one-syllable words were read accurately.

Route 1 (grapheme–phoneme conversion) Route 1 differs from the other routes in using grapheme–phoneme conversion, which involves converting spelling (graphemes) into sound (phonemes). A grapheme is a basic unit of written language whereas a phoneme is a basic unit of spoken language. Examples of graphemes are the i in pig and the igh in high. If a brain-damaged patient used only Route 1, what would we expect? Use of grapheme–phoneme conversion rules (converting each grapheme into the phoneme most closely associated with it) should permit accurate pronunciations of words with regular spelling–sound correspondence. However, these rules would not permit accurate pronunciation of irregular words not conforming to the conversion rules. For example, if the irregular word pint has grapheme–phoneme conversion rules applied to it, it would be pronounced to rhyme with hint. This is regularisation. Finally, grapheme–­ phoneme conversion rules can provide pronunciations of non-words. Surface dyslexics are apparently largely reliant on Route 1. Surface dyslexia involves special problems in reading irregular words. For example, KT, a surface dyslexic, read 81% of regular words and 100% of non-words accurately but only 41% of irregular words (McCarthy & Warrington, 1984). Over 70% of KT’s errors with irregular words involved regularisation. We might not expect to find cases of surface dyslexia in languages (e.g., Greek) lacking irregular words (i.e., all words follow grapheme–phoneme rules). However, Sotiropoulos and Hanley (2017) identified Greek individuals whose slow reading of Greek words suggested they might have surface dyslexia. When these individuals read English words and non-words, they showed the classic pattern associated with surface dyslexia: high accuracy

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 444

28/02/20 4:11 PM



445

Speech perception and reading

with regular words and non-words but severely impaired performance with irregular words.

Route 2 (lexicon + semantic knowledge) and Route 3 (lexicon only) The basic idea behind Route 2 is that representations of thousands of familiar words are stored in an orthographic input lexicon. Visual presentation of a word produces activation within this lexicon. This is followed by obtaining its meaning from the semantic system, after which its sound pattern is generated by the phonological output lexicon. Route 3 also involves the orthographic input and phonological output lexicons but bypasses the semantic system. What would we expect to find in patients using Route 2 or 3 but not Route 1? Their intact orthographic input and phonological output lexicons means they could pronounce familiar words (regular or irregular). However, their inability to use grapheme-phoneme conversion rules means they should find it very hard to pronounce unfamiliar words and non-words. Phonological dyslexics fit this predicted pattern fairly well. Phono­ logical dyslexia involves special problems with reading unfamiliar words and non-words. Caccappolo-van Vliet et al. (2004) studied two phonological ­dyslexics – their performance on reading familiar regular and irregular words exceeded 90% compared to under 60% with non-words. In a study discussed above, Sotiropoulos and Hanley (2017) identified two Greek individuals with phonological dyslexia: they had problems with non-words in Greek and English but not words.

KEY TERMS Phonological dyslexia A condition in which familiar words can be read but there is impaired ability to read unfamiliar words and non-words. Deep dyslexia A condition in which reading unfamiliar words and non-words is impaired and there are semantic errors (e.g., reading missile as rocket).

Deep dyslexia Deep dyslexia involves problems in reading unfamiliar words and an ina-

bility to read non-words. However, its most striking symptom consists of semantic reading errors (e.g., ship read as boat). According to Coltheart et al. (2001), deep dyslexics use a completely different reading system based in the right hemisphere (it is in the left hemisphere for 90% of people). Accordingly, they concluded deep dyslexia cannot be explained by the dual-route cascaded model. Most evidence is inconsistent with this right-­ hemisphere hypothesis.

Two routes? Findings from brain-damaged patients support the notion of two different routes (lexical vs non-lexical) in reading words. However, neuroimaging studies reveal individual differences. Jobard et  al. (2011) found only individuals with low working memory capacity (see Glossary) had activation in brain areas associated with grapheme–phoneme conversion. FischerBaum et al. (2018) found significant individual differences in the use of the non-lexical route by skilled readers. According to the dual-route model, the non-lexical route to reading involves grapheme–phoneme conversion. This requires serial left-to-right processing and so the time taken to start saying non-words should depend

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 445

28/02/20 4:11 PM

446 Language

on their length. In contrast, the lexical route involves parallel processing and so there should be minimal effects of length on the time taken to start saying words. Juphard et al. (2011) found the time to start saying three-syllable nonwords was 26% longer than one-syllable ones, but for words this difference was only 11%. Syllabic length of non-words (but not words) influenced the duration of activity in brain areas associated with phonological processing. These findings suggest producing phonological representations of non-words is a slow, serial process whereas it is fast and parallel for words.

Preliminary evaluation The dual-route cascaded model was the first systematic attempt to account for basic reading processes in brain-damaged and healthy individuals. The notion there are two routes in reading has been very influential and has attracted support from research on patients with various reading disorders (e.g., surface dyslexia; phonological dyslexia). The specific assumption there are separate lexical and non-lexical routes involving parallel and serial processing, respectively, has received behavioural and neuroimaging support from healthy individuals. What are the model’s limitations? First, it de-emphasises semantic processes in reading (discussed later, pp. 447–449). For example, Cattinelli et  al. (2013) found in a meta-analytic review that reading was associated with activation in brain areas (e.g., parts of the temporal lobe; the anterior fusiform region) associated with semantic processing. Second, the model focuses on the reading of individual words. However, word reading in everyday life typically occurs within sentences. Third, the model does not exhibit learning and so cannot explain how children acquire grapheme–phoneme rules. However, Perry et  al. (2007) developed a new connectionist dual-process model (the CDP+ model) based on the dual-route cascaded model. This model learns and also eliminates other problems with the dual-route model. Fourth, the model assumes phonological processing of words typically occurs relatively slowly and has little effect on word recognition and reading. In fact, however, phonological processing often occurs rapidly and automatically (Rastle & Brysbaert, 2006; discussed earlier, p. 436). Fifth, Adelman et al. (2014) found the model did not provide an adequate account of individual differences in reading. In addition, its assumption that readers have perfect knowledge of letter positions within words is incorrect. Sixth, it is desirable for computational models to have relatively few parameters (values free to change). The dual-route model has over 30 parameters, so it is unsurprising it fits the data well. Seventh, there are issues concerning the model’s applicability to non-­ English languages. For example, French orthography is unusual in that numerous letters are silent and so lack a phonological representation. However, the CDP+ model accounts for reading in French (Perry et  al., 2014). Eighth, the model only accounts for the reading of one-syllable words. However, Mousikou et  al. (2017) found stress, pronunciation and naming

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 446

28/02/20 4:11 PM



Speech perception and reading

447

times for two-syllable non-words were predicted by models incorporating aspects of the dual-route model.

Connectionist triangle model Within the dual-route model, it is assumed different routes are used to pronounce irregular words and non-words. According to the connectionist triangle approach, in contrast, All of the system’s knowledge of spelling-sound correspondences is brought to bear in pronouncing all types of letter strings [words and non-words]. Conflicts among possible alternative pronunciations of a letter string are resolved . . . by co-operative and competitive interactions based on how the letter string relates to all known words and their pronunciations. (Plaut et al., 1996, p. 58) Thus, reading depends on a highly interactive system based on “all hands to the pump”. The triangle model (which has been instantiated in distributed connectionist form) is shown in Figure 9.18. The three sides of the triangle are orthography (spelling), phonology (sound) and semantics (meaning). Of importance, the triangle model learns to produce the correct output (i.e., spoken word or non-word) from the input (i.e., written word or non-word) using back-propagation (see Glossary) by comparing actual responses against correct ones. If you compare Figure 9.18 with Figure 9.17, you can see semantics is more important in the triangle model. Note that this model (unlike the dual-route model) has no lexicons for orthographic or phonological words and lacks grapheme–phoneme rules. There are two routes from spelling to sound in the triangle model: (1) a direct pathway from orthography to phonology (O → P pathway); (2) an indirect pathway from orthography to phonology proceeding via semantics or word meanings (O → S → P pathway). The direct, non-­semantic pathway is typically used when reading high-frequency and regular or consistent words, whereas the indirect, semantic pathway is mostly used when reading low-frequency and irregular or inconsistent words. Hoffman et al. (2015) found brain areas associated with orthographic, phonological and semantic processing were all activated during word reading (see Figure 9.18). These findings are as predicted by the triangle model. According to the triangle model, words and non-words vary in ­consistency  – the extent to which their pronunciation agrees with those of similarly spelled words (neighbours). Harley (2010) gives the examples of TAZE and TAVE. TAZE has consistent neighbours (gaze; laze; maze), whereas TAVE does not (have as well as gave, rave, and save). The prediction (discussed later, p. 451) is that consistent words and nonwords should be said faster than inconsistent ones. In contrast, the dualroute model focuses on dividing words into regular ones (conforming

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 447

28/02/20 4:11 PM

448 Language Figure 9.18 The three components of the triangle model (left) and their associated neural regions (right). O = orthography; P = phonology; S = semantics. Orthographic processing involves the ventral occipito-temporal cortex; phonological processing involves the precentral gyrus; semantic processing involves the anterior temporal lobes.

P

P

S S

O

P S

O

From Hoffman et al. (2015).

to ­ grapheme-phoneme rules) and irregular (not conforming to those rules). How does the triangle model account for the different dyslexias? It is assumed surface dyslexia mostly involves damage to the semantic system. Plaut et al. (1996) lesioned or damaged their connectionist model to reduce the contribution of semantics. Its performance matched the pattern with surface dyslexics: very good with all consistent words and non-words, worse on inconsistent high-frequency words, and worst on inconsistent low-frequency words. The model assumes that phonological dyslexia (involving problems in reading unfamiliar words and non-words) involves a general impairment of phonological processing. Relevant evidence is discussed later (see p.  450). Finally, there is deep dyslexia (involving problems in reading unfamiliar words and non-words plus semantic errors). Within the triangle model, deep dyslexia represents a serious form of phonological dyslexia with severely impaired phonological processing, leading to increased reliance on semantic processing.

Findings Plaut et al. (1996) gave the model prolonged training with 2,998 words. Its performance resembled that of adult readers in various ways: (1) Inconsistent words took longer to name than consistent ones. (2) Rare words took longer to name than common ones. (3) The effects of consistency were much greater for rare words than common ones. (4) The network pronounced over 90% of non-words “correctly” (com­ parable to adult readers). This is impressive given the network received no direct training on non-words. The triangle model assumes semantic processing (involving the O → S → P pathway) is generally involved when inconsistent/irregular words are read.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 448

28/02/20 4:11 PM



Speech perception and reading

are  read. As predicted, Hoffman et  al. (2015) found greater activation within the ­anterior temporal lobe (associated with semantic processing) when ­participants read inconsistent/irregular words than when they read ­consistent/regular ones. Hoffman et  al.’s (2015) findings do not show that semantic processing within the anterior temporal lobe plays a causal role. Ueno et  al. (2018) obtained more direct evidence. They administered transcranial magnetic stimulation (TMS; see Glossary) to the anterior temporal lobe to impair its functioning while participants read words. As predicted, TMS reduced reading accuracy when inconsistent/irregular words were read but not consistent/regular words. According to the triangle model, the O → P pathway should be used mostly when participants read regular/consistent words. As predicted, Hoffman et  al. (2015) found functional connectivity between the brain areas involved in orthographic (ventral occipito-temporal cortex) and phonological processing (the precentral gyrus) was much greater with such words than inconsistent/irregular words.

Preliminary evaluation The triangle model has several successes to its credit. First, there is much support for the two pathways assumed to be involved in reading aloud. Second, and most important, semantic processing plays a major role in reading especially with inconsistent or irregular words. Third, the triangle model focuses on how we learn to pronounce words, unlike the original dual-route cascaded model. Fourth, the model provides important insights into the mechanisms underlying the dyslexias (discussed below). What are the model’s limitations? First, the model “focused on the re­cognition of simple, often monosyllabic words” (Harley, 2013). Second, its emphasis is on explaining the reading of words presented in isolation, whereas words are typically read within a sentential context. Third, in its original version, as Plaut et al. (1996, p. 108) admitted, “The nature of processing within the semantic pathway has been characterised in only the coarsest way”. However, Harm and Seidenberg (2004) improved the model by implementing its semantic component to map orthography and phonology onto semantics.

Controversial topics We turn now to controversial topics where the two models make different predictions. Note, however, that both models have evolved over time, and so some predictions have changed.

1 Surface dyslexia Surface dyslexics have problems reading irregular or inconsistent words but perform reasonably well with regular or consistent ones and with nonwords. According to the dual-route model, surface dyslexics have damage to Routes 2 and 3 and so rely heavily on Route 1 (grapheme–phoneme conversion). In contrast, the triangle model claims the major problem in surface dyslexia is extensive damage to the semantic system.

449

450 Language

Woollams et  al. (2007) studied 51 patients with semantic dementia (see Glossary), a condition involving severe loss of knowledge about word meanings. Surface dyslexia was present in 48 of the patients, and the remaining 3 patients became surface dyslexics as their semantic knowledge deteriorated. Of crucial importance, there was a large negative correlation between the ability to read low-frequency irregular/inconsistent words and the extent of patients’ semantic knowledge. In sum, both models provide reasonable accounts of surface dyslexia. However, evidence that surface dyslexia is associated with severe problems within the semantic system is easier to account for on the triangle model.

2 Phonological dyslexia Phonological dyslexia involves severe difficulties in reading unfamiliar words and non-words. According to the dual-route model, phonological dyslexics have an inability to use Route 1 (grapheme–phoneme conversion)  – their problems are specific to reading. According to the triangle model, in contrast, phonological dyslexics have a more general phonological deficit. The evidence is mixed. Support for the dual-route model was obtained by Caccappolo-van Vliet et al. (2004) (discussed earlier, p. 445). Their two phonological dyslexics showed essentially intact performance on various non-reading phonological tasks. However, Woollams and Patterson (2012) studied patients exhibiting symptoms of phonological dyslexia when reading aloud. The number of phonological errors these patients made in picture naming predicted their reading performance, indicating they had a relatively general phonological deficit. Henry et  al. (2012, 2016) found patients with symptoms of phonological dyslexia had brain damage in areas associated with phonological processing. In addition, their performance when reading non-words depended on phonological processes also involved in speech production and speech perception. These findings support the triangle model. In sum, the available evidence indicates most (but not all) phonological dyslexics have fairly general phonological impairments. Thus, the ­findings are mostly more supportive of the triangle model.

3 Deep dyslexia Deep dyslexics make many semantic errors when reading aloud and have problems in reading unfamiliar words and non-words. As discussed earlier, Coltheart et  al. (2001) (p. 445) argued that an account of deep dyslexia is outside the scope of the dual-route model because deep dyslexics predominantly use the right (rather than left) hemisphere when reading. According to the triangle model, deep dyslexia and phonological dyslexia both involve severe impairments in phonological processing. The triangle model’s assumptions were supported by Jefferies et  al. (2007). Deep dyslexics performed poorly on various phonologically based tasks (e.g., phoneme addition; phoneme subtraction). They concluded deep dyslexics have a general phonological impairment as do phonological dyslexics. Crisp et  al. (2011) found deep dyslexics and phonological dyslexics

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 450

28/02/20 4:11 PM



Speech perception and reading

451

had substantially impaired ability to translate orthography (spelling) into phonology as predicted by the triangle model. It is plausible the semantic errors made by deep dyslexics occur because their very severe problems with phonological processing force them to rely heavily on the semantic system. Ablinger and Radach (2016) studied a deep dyslexic, KJ, who relied excessively on semantic processing while reading aloud. Therapy based on increasing the involvement of phonological processing enhanced his ability to read words aloud. In sum, the triangle model provides a generally persuasive account of deep dyslexia. However, it is probably not applicable to all deep dyslexics (Harley, 2013).

4 Word regularity vs word consistency According to the dual-route model, regular words (those conforming to the grapheme–phoneme rules in Route 1) can often be named faster than irregular words. According to the triangle model, what matters is consistency. The letter patterns in consistent words are pronounced in the same way in all words in which they appear and such words are predicted to be read faster than inconsistent words. Since irregular words tend to be inconsistent, we need careful experimentation to decide whether regularity or ­consistency is more important. Jared (2002) presented words belonging to the four following categories: (1) (2) (3) (4)

Regular consistent (e.g., bean) Regular inconsistent (e.g., beak) Irregular consistent (e.g., both) Irregular inconsistent (e.g., bear)

The findings were reasonably clear-cut: word naming times were affected much more by consistency than regularity (see Figure 9.19). Lee et al. (2005) studied Chinese speakers naming Chinese characters. Performance was influenced by character consistency and character regularity with low-frequency characters but only by consistency with high-frequency characters. Thus, consistency and regularity both played a role. In sum, research provides some support for the dual-route and triangle models. How­ ever, the findings provide stronger support for the triangle model.

5 Pronouncing non-words The dual-route model assumes non-word pronunciations depend on the application of grapheme–phoneme rules and so are inflexible. In contrast, the triangle model predicts

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 451

Figure 9.19 Mean naming latencies for high-frequency (HF) and lowfrequency (LF) words that were irregular (exception words: EXC) or regular and inconsistent (RI). Mean naming latencies of regular consistent words matched with each of these word types are also shown. The differences between consistent and inconsistent words were much greater than those between regular and irregular words (EXC compared to RI). From Jared (2002). Reprinted with permission from Elsevier.

28/02/20 4:11 PM

452 Language

flexibility because non-word pronunciations depend on an individual’s previous reading experience. As predicted by the triangle model, Coltheart and Ulicheva (2018) discovered considerable evidence of flexibility in the pronunciations of non-words. The triangle model makes another prediction. Variability in pronunciation should be greater with inconsistent non-words than consis­ tent ones because orthographically similar words provide more possible pronunciations for the former. Zevin and Seidenberg (2006) studied the ­pronunciations of consistent and inconsistent non-words. As predicted, the pronunciations of inconsistent words were more variable. This finding is not predicted by the dual-route model, according to which grapheme– phoneme rules should typically generate only one pronunciation for all non-words. Buetler et  al. (2014) studied the influence of language context on pronunciation of non-words. German/French bilinguals read non-words presented in a context of French or German words. Grapheme–phoneme rules were used more often in the German context. Why was this? The relationship between spelling and sound is much more consistent in German than French and so grapheme–phoneme conversion is easier to use in German.

6 General vs language-specific processes? More research supports the triangle model’s assumption that reading involves general processes than the dual-route’s assumption that it involves mostly reading-specific processes. Woollams et  al. (2018) studied stroke patients suffering from aphasia (severe language problems). They assessed general phonological and semantic processing abilities in these patients using tasks not involving reading. They then related individual differences in these abilities to reading performance. What did Woollams et  al. (2018) find? First, phonological processing ability strongly predicted reading performance with both words and non-words. Second, semantic processing ability strongly predicted reading performance with words but only weakly predicted reading performance with non-words. These findings are as predicted by the triangle model and strengthen the argument that poor reading performance often reflects impaired general cognitive processes. Much research on reading (and its disorders) has focused on the role of orthography (spelling; written form of words). However, poor readers may also have general problems with complex visual processing rather than more specific problems relating to correctly identifying letters and combinations of letters. Evidence that general visual processes may be involved was reported by Sigurdardottir et al. (2018). Individuals who were poor at reading also tended to have difficulties with face matching. In sum, the triangle approach suggests we should stop putting individuals with reading impairments into categories such as “phonological dyslexia”, “surface dyslexia” and “deep dyslexia”. Instead, we should assess their general semantic, phonological and visual abilities so their underlying cognitive impairments can be interpreted within a three-dimensional space formed by those three abilities.



453

Speech perception and reading

Conclusions The dual-route and triangle models share several impressive strengths and have deservedly been highly influential. First, both models assume correctly that reading words and non-words aloud is a complex achievement. Second, both models provide plausible accounts of reading applicable to dyslexics and those with intact reading skills. Third, both models have been implemented as computational models and so make precise predictions. With respect to the above six controversial issues, the triangle model has the edge (although this is less true of relatively recent theoretical developments). Why is this so? It is assumed within the triangle model that reading is a skill that developed only recently in our evolutionary history. As a result, reading depends heavily on general processes not specific to reading. In contrast, the dual-route model’s emphasis is on reading-­specific processes and structures (e.g., grapheme–phoneme rule system). This emphasis may be misplaced if evolutionary development has not provided us with the relevant neural architecture.

KEY TERMS Saccades Rapid eye movements separated by eye fixations lasting about 250 ms. Perceptual span The effective field of view in reading (letters to the left and right of fixation that can be processed). Parafovea The area in the retina immediately surrounding the fovea.

READING: EYE-MOVEMENT RESEARCH Several theoretical approaches discussed earlier (e.g., interactive activation model; dual-route cascaded model; triangle model) focus on explaining the processing of isolated words. In contrast, research on eye movements during text reading has led to theories focusing on word reading within sentential contexts (Snell et al., 2018).

Interactive exercise: Dual-route reading

Basic processes Eye movements are crucial to reading. Most text information we process relates to the word currently fixated. However, limited information from other words may also be processed. Our eyes move in rapid jerks (saccades). Saccades are ballistic (once initiated their direction cannot be changed). Regressions (the eyes moving backwards in the text) account for 10% of saccades. Saccades take 20–30 ms to complete and are separated by fixations lasting 200–250 ms. The length of each saccade is about 8 letters or spaces. Information is extracted from text only during each fixation. The amount of text from which useful information can be extracted on each fixation has been assessed using the “moving window” technique (Rayner et  al., 2012). The text is mutilated except for an experimenter-­ defined area or window surrounding fixation point. When readers move their eyes, different parts of the text are mutilated to permit normal reading only within the window region. The perceptual span (effective field of view) extends 3 or 4 letters to left of fixation and up to 15 letters to the right in English and is affected by text difficulty. The size of the perceptual span means information from the p ­ arafovea (the area surrounding the fovea: see Glossary) is used during reading. Convincing evidence comes from the boundary technique. There is a preview word just to the right of fixation. When readers make a saccade to this word, it changes into the target word. The fixation time on the target

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 453

28/02/20 4:11 PM

454 Language

KEY TERM Spillover effect Any given word is fixated longer during reading when preceded by a rare word rather than a common one.

word is less when it is the same as the preview word (Vasilev & Angele, 2017). Readers fixate 80% of content words (nouns, verbs and adjectives) but only 20% of function words (e.g., a, the, and, or). Words not fixated tend to be those easily processed (e.g., common, short or predictable). Finally, a word’s fixation time is longer if preceded by a rare word (the spillover effect). There are numerous theories of reading based on eye-movement data. We will focus on the most influential one: the E-Z Reader model.

E-Z Reader model The original version of the E-Z Reader model was proposed by Reichle et  al. (1998) and has been followed by several other versions (Sheridan & Reichle, 2016). The model assumes the mind and eyes are tightly coupled, and so patterns of eye movements provide potentially rich data concerning readers’ processing strategies. The most obvious model would assume we fixate a word until it is processed adequately, after which we immediately fixate the next word. Alas, there are two major problems with this model. First, it takes 85–200 ms to execute an eye-movement programme and so readers would waste time waiting for their eyes to move to the next word. Second, readers sometimes skip words. According to the model, readers know nothing about the next word until it is fixated. How, then, could they decide which words to skip? The E-Z Reader model provides elegant solutions to the above problems. A crucial assumption is that the next eye movement is programmed after only partial processing of the current word (word n). This greatly reduces the time between completing processing of word n and an eye movement to the next word (word n+1). There is typically less spare time available with rare words than common ones – this accounts for the spillover effect. If the processing of word n+1 is completed rapidly enough, it is skipped. According to the model, readers can attend to two words (words n and n+1) during a single fixation. However, it is a serial processing model – at any given moment, only one word is processed. Here are its main assumptions (see Figure 9.20): (1) Readers check the familiarity of the word currently fixated (word n). (2) Completion of the familiarity check (the first stage of lexical access) is the signal to initiate an eye-movement programme. (3) Readers then engage in the second stage of lexical access, which involves accessing word n’s semantic and phonological forms. (4) Familiarity checking and lexical access are completed faster for easily processed words (e.g., common; short; predictable). (5) Completion of lexical access to word n signals a shift of covert (internal) attention to word n+1. Several findings support the model (Reingold et al., 2012). First, common words are fixated for less time than rare ones. Second, a word following a rare word is fixated longer than one following a common word (the

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 454

28/02/20 4:11 PM



455

Speech perception and reading

Eyes fixate word n + 1 Focus of attention on word n + 1 Lexical access of word n

Preview time

Familiarity check on word n

Time (ms)

Figure 9.20 Key assumptions of the E-Z Reader model. The x-axis shows the processing difficulty of the word currently being fixated (word n). The preview time (shaded area) is the time available for parafoveal processing of word n+1 (covert attention) prior to eye fixation on that word. From Sheridan & Reichle (2016).

Easy

Difficult

Word n processing difficulty

spillover effect) because it receives less parafoveal processing when word n is rare (Luke, 2018). Third, word n+1 is skipped when its lexical access has been completed during fixation on word n. This typically occurs when word n is common, short or predictable. The E-Z Reader model (which emphasises serial processing) can be contrasted with parallel processing models such as the SWIFT model (Saccade-Generation With Inhibition by Foveal Targets) model (e.g., Engbert et  al., 2005; Schad & Engbert, 2012). This model assumes the durations of eye fixations in reading are often influenced by parallel processing of the previous and next words as well as the current one. Attentional processes are of central importance to both models. The E-Z Reader model assumes an attentional spotlight moves from one word to the next. Within the SWIFT model, in contrast, attention is more like a zoom lens because its scope can change (see Chapter 5). Attention is widely distributed when foveal processing is easy but more narrowly focused when it is hard. The two models both account for most findings. However, they differ with respect to lexical parafoveal-on-foveal effects. This sounds complicated but simply means lexical properties of the next word (e.g., its frequency and/ or predictability) influence the fixation duration on the current word. Such effects should not occur according to the serial processing E-Z Reader model, but they can occur according to the parallel processing SWIFT model.

Findings

KEY TERM

According to the E-Z Reader model, there are two stages of lexical processing for words: (1) checking word familiarity; (2) lexical access (accessing semantic and phonological information about the word). Sheridan and Reingold (2013) argued that presenting words faintly disrupts stage (1) but not stage (2). In contrast, case alternation (e.g., tAbLe) disrupts only stage (2). Their findings were as predicted, thus supporting the notion that lexical processing occurs in two stages.

Lexical parafoveal-on-­ foveal effects The finding that fixation duration on the current word (word n) is influenced by lexical properties of the next word (word n+1).

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 455

28/02/20 4:11 PM

456 Language

According to the E-Z Reader model, readers use parafoveal processing to extract limited information from the next word (n+1) before it is fixated (this occurs during the preview time shown in Figure 9.20). As a result, fixation time on word n+1 is reduced when parafoveal processing is possible. Vasilev and Angele (2017) found in a meta-analysis that ­parafoveal review reduced gaze duration on word n+1 by an average of 40 ms. What information is extracted from word n+1 during parafoveal review? Schotter et al. (2012) found that orthographic (word spelling), phonological (word sound) and morphological (word structure) information can all be processed parafoveally. As mentioned earlier, readers sometimes skip word n+1 (i.e., do not fixate it) when reading. This suggests a complete identification of word n+1 can occur during parafoveal review. Consistent with these findings, Angele et  al. (2016) found evidence for two stages of parafoveal processing: (1) early orthography-based processing; and (2) late attentionally dependent lexical access. Most research has focused only on the English language. Rayner et al. (2007) studied eye movements in Chinese individuals reading Chinese text. Chinese differs from English in that it is written without spaces between characters. Nevertheless, the pattern of eye movements resembled that found in readers of English. We turn now to lexical parafoveal-on-foveal effects where lexical properties (e.g., frequency) of the next word influence the processing of the current word. Remember the SWIFT model predicts such effects whereas the E-Z Reader model does not. There would be evidence for ­parafoveal-on-foveal effects if gaze duration on word n were greater when word n+1 was a low-frequency rather than high-frequency word. Brothers et  al. (2017) conducted a meta-analytic review of research where the frequency of word n+1 was manipulated. There was no evidence for parafoveal-on-foveal effects across several languages (e.g., English; Finnish; Spanish; Chinese). Brothers et  al. reported a similar absence of lexical parafoveal-on-foveal effects in further meta-analyses focusing on other features of word n+1 related to lexical access (i.e., semantic plausibility; lexical predictability). Degno et  al. (2019) pointed out that most previous research had used very artificial reading conditions. They used a more natural reading task, but also failed to find any evidence of lexical parafoveal-on-foveal effects.

Evaluation The E-Z Reader model is very successful in several ways. First, there is ample justification for its emphasis on eye-movement patterns, because “the control of eye movements is part and parcel of the dynamics of information processing within the task of reading itself” (Radach & Kennedy, 2013, p. 429). Second, it identifies several factors (e.g., word frequency; word predictability) determining eye fixations in reading. Third, there is support for the assumption lexical processing occurs in two separate stages (i.e., familiarity checking and lexical access). Fourth, parafoveal preview of the next word typically facilitates its subsequent processing when fixated. Fifth, the assumption there are close

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 456

28/02/20 4:11 PM



Speech perception and reading

457

connections between eye (fixations) and mind (cognitive processes) has received support (e.g., Bixler & D’Mello, 2016). Sixth, the absence of lexical parafoveal-on-fovea effects supports serial models (e.g., E-Z Reader) over parallel models (e.g., SWIFT). What are the model’s limitations? (1) The E-Z Reader and SWIFT models both explain where and when the eyes move during reading. Such approaches have proceeded independently of approaches (e.g., McClelland & Rumelhart’s, 1981, interactive activation model) designed to identify the processes involved in word recognition. However, Snell et  al. (2018) produced a computational model of reading (OB1-reader) integrating insights from eye-movement and word-recognition models. (2) The role of higher-level processes is de-emphasised. For example, the processes involved in inference drawing, integration of information within sentences, and the use of schematic and other knowledge in text comprehension are outside the model’s scope (see Chapter 10). (3) We do not know in detail how readers perform the familiarity check. Reingold et al. (2015) argued it depends on the fluency of orthographic processing (processing the pattern of letters) but the evidence is inconclusive. (4) There may be more parallel processing during reading than acknow­ ledged by the E-Z model. For example, Snell et al.’s (2018) OB1-reader model (which assumes extensive parallel processing) successfully accounts for many aspects of reading behaviour.

CHAPTER SUMMARY •

Speech (and music) perception: introduction. Speech and music perception both involve categorical perception. However, there are typically substantial differences in the brain areas activated during speech and music perception. In addition, some patients have selective impairment of speech or music perception. Speech perception involves various stages starting with selection of speech from the acoustic background and including word recognition and utterance interpretation. Speech perception is often more variable than implied by the notion of sequential stages.



Listening to speech. Among the problems faced by listeners are the speed of spoken language, the segmentation problem, co-articulation, individual differences in speech patterns, and degraded speech. Listeners prefer to use lexical (e.g., syntax) information to achieve word segmentation but can also use co-articulation, allophony and syllable stress. Listeners often cope with variations between speakers by forming a speaker model. The McGurk effect shows listeners often make use of visual information (i.e., lip-reading) during speech perception.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 457

28/02/20 4:11 PM

458 Language



Context effects. Context influences speech perception in several ways (e.g., phonemic restoration effect; Ganong effect) There is much controversy concerning how context influences speech perception. The main divide is between those arguing such effects occur late in processing (autonomous position) and those arguing they can also occur early in processing (interactionist position). The interactionist position has received much support recently. However, it is more applicable when speech is degraded than when it is clear and unambiguous.

Theories of speech perception. Spoken word recognition is often influenced by orthography (word spellings). According to the motor theory, motor processes can facilitate speech perception. In support, brain areas involved in speech production are typically involved in speech perception. Impaired speech perception following damage to speech-production systems also supports the theory. However, many brain areas are involved in speech perception but not speech production. The TRACE model assumes bottom-up and top-down processes interact flexibly in spoken word recognition. The model accounts for several phenomena (e.g., word superiority effect, the Ganong effect, categorical perception and the phonemic restoration effect). However, it has a narrow focus on word recognition, it exaggerates the importance of top-down processes and it de-emphasises the role of conceptual meaning. The cohort model assumes spoken word recognition involves rejecting competitors in a sequential process. It also assumes that context effects occur during the integration stage following word identification. However, context sometimes influences word processing prior to the integration stage. The model also de-emphasises the role of word meanings in spoken word recognition. There is support for the more recent assumption that there is continuous integration of information from the speech input and context. •



Cognitive neuropsychology. Brain-damaged patients exhibit various patterns of impairment in speech perception. Some of these patterns can be explained by assuming the existence of three routes between sound and speech. Support has been obtained by studying patients with pure word deafness, word meaning deafness and transcranial sensory aphasia. The threeroute approach provides a sketch map rather than a detailed theoretical account.



Reading: introduction. It is harder to read English than most other languages because the relationship between spelling and sound is very inconsistent. Lexical decision, naming and priming tasks are used to assess word identification. Studies of masked phonological priming suggest that visual word recognition typically depends on

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 458

28/02/20 4:11 PM



Speech perception and reading

459

prior phonological processing. The finding that word recognition depends on the number of phonologically similar words also indicates the importance of phonological processing. •

Word recognition. According to the interactive activation model, ­bottom-up and top-down processes interact during word recognition. The model accounts for the word-superiority effect and the effects of orthographic neighbours on word recognition but not for the roles of meaning and sound. Semantic priming can facilitate word recognition “automatically” or in a more controlled fashion. More generally, semantic priming reduces the amount of visual information required for word recognition. Words predictable within the sentence context are processed faster than those less predictable. Readers’ predictions are typically general or graded rather than specific, which minimises prediction-error costs.



Reading aloud. According to the dual-route model, reading involves lexical and non-lexical routes (the latter involving grapheme–phoneme conversion rules). Surface dyslexics rely mainly on the non-lexical route whereas phonological dyslexics use mostly the lexical route. The triangle model consists of orthographic, phonological and semantic systems. The reading of regular or consistent words involves a pathway from orthography to phonology, whereas the reading of irregular/inconsistent words involves a pathway from orthography to phonology via semantics. Surface dyslexia is attributed to damage within the semantic system, whereas phonological dyslexia stems from a general phonological impairment. The triangle model emphasises general processes not specific to reading whereas the dual-route model focuses on reading-specific processes. The triangle model provides a more adequate account (e.g., the importance it attaches to semantic processing is well supported).





Reading: eye-movement research. According to the E-Z Reader model, the completion of familiarity checking of the current word is the signal to initiate an eye-movement programme, and the completion of lexical access is the signal to shift attention covertly to the next word. It is a serial processing model in contrast to parallel processing models (e.g., SWIFT). The absence of lexical parafoveal-on-foveal effects (lexical effects of the next word on processing of the current word) supports serial models. The E-Z Reader model is limited because it de-emphasises the processes involved in word recognition and higher-level reading processes (e.g., use of knowledge in text comprehension).

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 459

28/02/20 4:11 PM

460 Language

FURTHER READING Cai, Z.G. & Vigliocco, G. (2018). Word processing. In S. Thompson-Schill (ed.), Stevens’ Handbook of Experimental Psychology and Cognitive Neuroscience, Vol. 3: Language and Thought (4th edn; pp. 75–110). New York: Wiley. The processes involved in processing words presented in text and in speech are discussed in detail. Eisner, F. & McQueen, J.M. (2018). Speech perception. In S. Thompson-Schill (ed.), Stevens’ Handbook of Experimental Psychology and Cognitive Neuroscience, Vol. 3: Language and Thought (4th edn; pp. 1–46). New York: Wiley. This chapter contains a comprehensive account of basic processes involved in speech perception. Grainger, J. (2018). Orthographic processing: A “mid-level” vision of reading: The 44th Sir Frederic Bartlett lecture. Quarterly Journal of Experimental Psychology, 71, 335– 359. Basic processes involved in word recognition and reading are discussed in detail in this article by Jonathan Grainger. Nieuwland, M.S. (2019). Do “early” brain responses reveal word form prediction during language comprehension? A critical review. Neuroscience and Biobehavioral Reviews, 96, 367–400. Mante Nieuwland discusses how contextual information is used by readers and listeners. Pickering, M.J. & Gambi, C. (2018). Predicting while comprehending language: A theory and review. Psychological Bulletin, 144, 1002–1044. Martin Pickering and Chiara Gambi provide a thorough discussion of research supporting the theoretical assumption that listeners’ speech perception often depends on their speech-production system. Schwanenflugel, P.J. & Knapp, N.F. (2016). The Psychology of Reading: Theory and Applications. New York: Guilford Press. Paula Schwanenflugel and Nancy Knapp provide a comprehensive account of theory and research on reading. Snell, J., van Leipsig, S., Grainger, J. & Meeter, M. (2018b). OB1-reader: A model of word recognition and eye movements in text reading. Psychological Review, 125, 969–984. Joshua Snell and colleagues provide a theoretical model of reading based on word-recognition and eye-movement research.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 460

28/02/20 4:11 PM

Chapter

Language comprehension

10

INTRODUCTION Basic processes involved in the identification of individual words during the initial stages of reading and listening to speech were discussed in Chapter  9. In this chapter, we discuss how phrases, sentences and entire texts (e.g., stories) are processed and understood during reading and listening. Sentence comprehension is complex. Neural activity within the brain increases steadily during the reading of a sentence but not with non-word lists or meaningless sentences (Fedorenko et  al., 2016). This progressive increase “reflects the increasing complexity of the evolving representation of the meaning of the sentence” (Fedorenko et al., 2016, p. E6262). The previous chapter dealt mainly with aspects of language processing differing between reading and listening to speech. In contrast, higher-level comprehension processes are similar whether a story is listened to or read. The research focus has been on comprehension processes in reading rather than listening and so our emphasis will be on reading. However, what is true of reading is mostly also true of comprehending speech. What is the structure of this chapter? We start by considering comprehension at the sentence level and finish by focusing on comprehension processes with larger language units (e.g., complete texts). More detail is given below. There are two main levels of analysis in sentence comprehension. First, there is an analysis of the syntactical structure of each sentence. Syntax involves a study of the rules of word order. Grammar is somewhat broader in meaning. It focuses on the structure of a language (especially syntax and inflections). Inflections involve modifying nouns or verbs to indicate grammatical changes (e.g., adding -ed to a verb to indicate the past tense). Second, there is an analysis of sentence meaning. The intended meaning of a sentence often differs from its literal meaning as in irony, sarcasm or metaphor. For example, someone may say “Don’t overdo it!” when talking to a notoriously lazy colleague. The study of intended meaning is pragmatics. The context in which a sentence is spoken can also influence its intended meaning. Issues concerning pragmatics are discussed immediately after the section on parsing. In the third section, we consider processes involved when i­ndividuals are presented with a text or speech consisting of several or numerous

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 461

KEY TERMS Syntax The set of rules concerning word order to create well-formed sentences. Grammar The set of rules governing the structure of a language (especially syntax and inflections). Inflections Grammatical changes to nouns or verbs (e.g., adding -s to a noun to indicate the plural; adding -ed to a verb to indicate the past tense).

28/02/20 4:11 PM

462 Language

KEY TERMS Parsing Analysing the syntactical or grammatical structure of sentences. Morphology The study of words and how they are formed from morphemes.

sentences. Our focus will mainly be on the inferences readers and listeners draw during comprehension. The major theoretical issue is the following: what determines which inferences are (or are not) drawn during language comprehension? In the fourth section, we consider processing involving larger units of language. When we read a text, we typically try to integrate the information within it. Such integration often involves drawing inferences, identifying the main themes in the text, and so on. These integrative processes (and theories put forward to explain them) are discussed in this section.

PARSING: OVERVIEW This section is devoted to parsing (analysis of the syntactical or grammatical structure of sentences) plus the processes readers and listeners use to comprehend sentences. Parsing allows readers and listeners “to say who did what to whom (or how, when, and where)” (Traxler, 2014, p. 605). Most parsing research has focused only on the English language. Does this matter? The short answer is “Yes”. Information about grammar can be provided by word order or by inflection (see Glossary). Many languages (e.g., Arabic; German; French) are more inflectional than English and thus have a richer morphology (analysis of the morphemes or basic units of meaning within words). Such languages permit greater flexibility of word order than English. As a consequence, it has proved easier to develop computational models of parsing for English than most other languages (Tsarfaty et al., 2013).

Syntax and grammar We can produce an infinite number of grammatically correct sentences in any language (this is known as productivity). Linguists (e.g., Chomsky, 1957) have produced rules explaining the productivity and regularity of language. A set of rules focusing on syntax or word order and inflections forms a grammar. Ideally, we can use a grammar to decide whether any given sentence is permissible or unacceptable. Numerous sentences are ambiguous. Some are globally ambiguous meaning the entire sentence has two interpretations (e.g., “Kids make nutritious snacks”). Others are locally ambiguous – various interpretations are possible during parsing. Why are so many sentences ambiguous? Language users apply a least effort principle because it would be very demanding for speakers and writers to produce only unambiguous sentences (Solé & Seoane, 2015). Piantadosi et  al. (2012) argued listeners and readers can usually deal with some ambiguity. The context typically provides useful information about sentence meaning. In addition, it would be inefficient (and extremely boring for listeners and readers!) if spoken or written language duplicated that information. Why does much research on parsing use ambiguous sentences? Parsing generally occurs very rapidly, which makes it hard to study the processes

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 462

28/02/20 4:11 PM



Language comprehension

463

involved. Assessing the problems encountered by listeners and readers struggling with ambiguous sentences is revealing about parsing processes.

Prosodic cues One way that listeners work out the syntactic or grammatical structure of spoken languages is by using prosodic cues (e.g., stress; intonation; rhythm; word duration). When each syllable is spoken in a monotone lacking prosodic cues, listeners struggle to understand the speaker (Duffy & Pisoni, 1992). The use of prosodic cues by speakers and writers is discussed in Chapter 11. Suppose a spoken sentence contains a prosodic cue (pause) that occurs misleadingly at a place conflicting with its syntactic structure. Pauker et al. (2011) found this made the sentence much harder to understand, thus showing the impact of prosodic cues (this study is discussed in more detail later, see p. 466). Prosodic cues are most valuable with ambiguous spoken sentences. Consider the ambiguous sentence, “The old men and women sat on the bench”. If the women are not old, the spoken duration of men will be relatively long and the stressed syllable in women will have a steep rise in pitch contour. Listeners often use prosodic cues very rapidly to facilitate the understanding of ambiguous sentences. Holzgrefe et  al. (2013) presented word strings such as Mona oder Lena und Lola [Mona or Lena and Lola] auditor­ ily with a pause and other prosodic cues occurring after the word Mona (early pause) or Lena (late pause). When the pause came after the word Lena to indicate it was Mona or Lena as well as Lola, listeners immediately integrated the prosodic information into their parsing. In a similar study, Petrone et  al. (2017) found parsing was more influenced by pauses at phrase boundaries than other prosodic cues (e.g., phrase-final lengthening: longer sound at the end of a phrase). As Drury et  al. (2016, p. 1) pointed out, “Unlike speech, written language does not provide the same wealth of prosodic information”. How do readers cope? According to Fodor’s (1998) implicit prosody hypothesis, readers activate prosodic patterns during silent reading using their “inner voice”. Support for Fodor’s (1998) hypothesis was reported by Steinhauer and Friederici (2001), who asked participants to listen to sentences containing pauses and read sentences containing commas. Event-related potentials (ERPs; see Glossary) were similar in both cases suggesting participants used their “inner voice” while reading. In a similar reading study, Drury et al. (2016) manipulated the plausibility of sentences via the presence or absence of commas (e.g., John, said Mary, was the nicest boy at the party vs John said Mary was the nicest boy at the party). The impact of this manipulation on ERPs closely resembled the impact of manipulating pauses with similar spoken sentences in previous research. These findings are also consistent with the implicit prosody hypothesis. Direct evidence implicit prosody benefits reading was reported by Calet et  al. (2017). Prosody training (reading with an emphasis on sensitivity to prosody) increased reading fluency and comprehension in primary-school children.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 463

28/02/20 4:11 PM

464 Language

The effects of prosody are more complex than indicated so far in three ways. First, we must consider the overall pattern of prosodic phrasing within a sentence rather than simply what happens at one particular point. Consider the following ambiguous sentence: I met the daughter (#1) of the colonel (#2) who was on the balcony. Frazier et al. (2006) found the interpretation of this sentence depended on the relationship between the phrase boundaries at #1 and #2. Listeners were much more likely to assume the colonel was on the balcony when the first boundary was greater than the second one than when the first boundary was smaller than the second. Second, there is much evidence that individual speakers differ considerably in their production of prosody (Cole, 2015). This variability makes it harder for listeners to understand what any given speaker is saying. Third, Fodor (1998) assumed implicit prosody in reading (based on inner speech) is very similar to explicit or spoken prosody. Research findings supporting this assumption were discussed earlier. However, it is not always supported. Jun (2010) found systematic differences between prosody generated in silent reading and prosody generated in reading aloud when the text had not been skimmed in advance. Thus, the implicit prosody hypothesis may have limited applicability.

THEORETICAL APPROACHES: PARSING AND PREDICTION An important theoretical issue is to work out when different kinds of information are used during sentence comprehension. Much research on parsing concerns the relationship between syntactic and semantic analysis. There are at least four major possibilities: (1) Syntactic analysis generally precedes (and influences) semantic analysis. (2) Semantic analysis usually occurs prior to syntactic analysis. (3) Syntactic and semantic analysis occur at the same time. (4) Syntax and semantics are very closely associated and have a hand-inglove relationship (Altmann, personal communication). At the risk of oversimplification, early theories of parsing tended to favour possibility (1) above, whereas later ones focus more on the remaining possibilities (Traxler, 2014). There are more models of parsing than you can shake a stick at. However, many belong to two categories: (1) two-stage, serial processing theories; (2) one-stage, parallel processing models. The garden-path model (Frazier & Rayner, 1982) has been the most influential one in the first category. Its key assumption is that the initial attempt to parse a sentence uses only syntactic information. MacDonald et  al.’s (1994) constraint-based model has been the most influential example of a one-stage, parallel processing model. Its key

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 464

28/02/20 4:11 PM



Language comprehension

465

assumption is that all information sources (syntactic; semantic; contextual) are used from the outset to construct a syntactic model of each sentence. We initially consider the above two models. After that, we turn to alternative models. One of these is the unrestricted race model, which combines aspects of the garden-path and constraint-based models. We will discover many apparent inconsistencies in research findings on parsing. Why is that? Most people are very sensitive to language subtleties. As a result, sentence parsing often varies because of relatively minor differences in the sentences presented. It has often been assumed in linguistics and cognitive psychology that nearly all adult native speakers have fully mastered the grammar of their language (Chomsky, 1965), and this assumption is implicit in many theories and much research. It follows from the above assumption that inaccurate sentence parsing reflects deficient processing rather than deficient grammatical knowledge and competence. In fact, non-verbal IQ correlates +.46 with grammatical knowledge, meaning that many individuals with low non-verbal IQ have severely deficient grammatical knowledge (Dąbrowska, 2018). However, researchers rarely consider deficient grammatical knowledge as a potential explanation for poor parsing performance.

Garden-path model Frazier and Rayner’s (1982) garden-path model was an early theory of parsing. It is so-called because readers (and listeners) can be misled or “led up the garden path” by ambiguous sentences. A famous (or notorious!) example of such a sentence is “The horse raced past the barn fell”. It is notorious because it is very hard to understand (partly because such sentences occur exceptionally rarely in naturally produced sentences). The model incorporates the following assumptions: ●● ●●

●●

●●

●●

●●

●●

Only one syntactical structure is initially considered for any sentence. Meaning is not involved in the selection of the initial syntactical structure. Two general principles influence the initial syntactical structure: minimal attachment and late closure. According to the principle of minimal attachment, the structure producing the fewest nodes (major parts of a sentence such as noun phrase and verb phrase) is preferred. The principle of late closure is that new words encountered in a sentence are attached to the current phrase or clause if grammatically permissible. Conflict between the above two principles is resolved in favour of the minimal attachment principle. If the initial syntactic structure is incompatible with additional ­information (e.g., semantic), it is revised during a second processing stage.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 465

28/02/20 4:11 PM

466 Language

Why do readers use the minimal attachment and late closure principles? According to Clifton et  al. (2016, p. 8): they arise “out of the pressure to interpret a sentence as quickly as possible . . . relating new words to phrases currently in active memory is faster than building new or more complex structures”.

Findings The relevance of the principle of minimal attachment was shown by Frazier and Rayner (1982). Consider the following sentences: (1) The girl knew the answer by heart. (2) The girl knew the answer was wrong. The minimal attachment principle produces a grammatical structure in which the answer is treated as the direct object of the verb “knew”. This is appropriate only for the first sentence. As predicted, Frazier and Rayner found eye fixations were longer with the second sentence. Frazier and Rayner (1982) also showed the importance of the principle of late closure. Consider the following sentences: (1) Since Jay always jogs a mile it seems like a short distance to him. (2) Since Jay always jogs a mile seems like a short distance to him. Use of the principle of late closure leads a mile to be included in the first clause as the object of jogs. This is appropriate only for the first sentence. Readers had very long fixations on the word seems in the second sentence when it became clear the principle of late closure was not applicable. However, the second sentence is much easier to read with a comma (a prosodic cue) after jogs (Rayner et al., 2012). According to the garden-path model, the syntactic structure of sentences can often be worked out in the almost complete absence of semantic information. Supporting evidence comes from patients with semantic dementia (a condition involving loss of word meanings; see Glossary and Chapter 7). Such patients sometimes show essentially intact performance on tasks involving grammaticality judgements (e.g., Garrard et al., 2004). In a study mentioned earlier, Pauker et  al. (2011) presented listeners with sentences such as the following including prosodic cues (pauses indicated by #): (1) When a bear is approaching the people # the dogs come running. (2) When a bear is approaching # the people # the dogs came running. According to the model, listeners should apply the principle of late closure and so find it easy to identify the correct syntactical structure. This was found with sentences such as (1) where the pause’s location coincided with the syntactic structure. In contrast, performance was very poor with sentences such as (2) because of the misleading pause (e.g. between approaching and people). Thus, listeners’ adherence to the principle of late closure can be greatly disrupted by misleading prosodic cues. Drury et  al. (2016),

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 466

28/02/20 4:11 PM



Language comprehension

467

in a study discussed earlier (p. 463), obtained similar findings when readers were presented with ambiguous sentences and commas which indicated (or failed to indicate) the appropriate syntactic structure. According to the garden-path model, the initial parsing of an ambiguous sentence should be uninfluenced by visual context providing information relevant to the sentence’s interpretation. Coco and Keller (2015) presented listeners with ambiguous sentences and with relevant visual context. Ambiguity resolution within the sentence depended mostly on linguistic information, which is broadly consistent with the model. However, visual context had more influence on sentence processing than expected by the model. Finally, when readers initially construct an incorrect syntactic structure for a garden-path sentence, the model predicts they should revise it in the light of additional information and so typically produce the correct syntactic structure. Findings reported by Qian et al. (2018) are inconsistent with this prediction. Readers were presented with garden-path sentences such as the following: While the man hunted, the deer that was brown and graceful ran into the woods. This was followed by the question, Did the man hunt the deer? Readers produced numerous incorrect “Yes” responses with such sentences (approximately 50% errors). Thus, they often failed to work out the correct syntactic structure.

Evaluation The model provides a simple and coherent account of parsing. The principles of minimal attachment and late closure often influence the selection of an initial syntactic structure for sentences. The model is plausible in that these two principles reduce processing demands on the reader or listener. What are the model’s limitations? First, it assumes parsers who discover that their initial preferred syntactic structure is incorrect go back to square one and form an alternative structure. As Kuperberg and Jaeger (2016) pointed out, this “all-or-nothing” assumption is simply incorrect. More generally, the model mistakenly assumes initial attempts at parsing are inflexible (see below, pp. 468–470). Second, opposed to the model’s assumptions, den Ouden et  al. (2016) found syntactic processing typically does not occur in the absence of other forms of processing. This conclusion was based on patterns of brain activation during the processing of garden-path sentences. Behavioural evidence discussed shortly indicates that several factors (including misleading prosody and prior context) prevent readers and listeners from adhering to the principles of minimal attachment and late closure. Third, the model assumes readers will ultimately generate a correct syntactic structure even for complex sentences. However, this is often not the case when sentences are complex and hard to comprehend (e.g., Qian et al., 2018).

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 467

28/02/20 4:11 PM

468 Language

Fourth, the model is hard to test. For example, evidence that non-­ syntactic information is used early in sentence processing seems inconsistent with the model. However, the second stage of parsing (following the first, syntactic stage) may simply start very rapidly. Fifth, the model is more applicable to English than other languages. For example, there is a preference for early (rather than late) closure in several languages (e.g., Spanish; Russian; French) (Harley, 2013). Mandarin differs from most European languages in having fewer reliable cues to syntactic structure and a more flexible word order (Huang et al., 2016). Thus, principles such as those of minimal attachment and late closure are not directly relevant to Mandarin.

Constraint-based model According to MacDonald et  al.’s (1994) constraint-based model, initial sentence interpretation depends on multiple information sources (e.g., syntactic; semantic; general world knowledge) called constraints. These constraints limit (or constrain) the number of possible interpretations. The model is based on a connectionist architecture (see Chapter 1) which exhibits learning through experience. It is assumed all relevant sources of information are immediately available to the parser. Competing analyses of the current sentence are activated at the same time. The syntactic structure receiving most support from the various constraints is more activated than other syntactic structures. Confusion occurs if the correct syntactic structure is less activated than one or more incorrect structures. The processing system uses four language characteristics to resolve sentence ambiguities: (1) Grammatical knowledge constrains possible sentence interpretations. (2) The various forms of information associated with any given word are typically not independent of each other. (3) A word may be less ambiguous in some ways than in others (e.g., ambiguous tense but not grammatical category). (4) The various interpretations permissible according to grammatical rules generally differ considerably in frequency and probability based on past experience. The syntactic interpretation most consistent with such experience is typically selected. MacDonald (2013) developed her constraint-based model. She started by assuming speakers use various strategies to reduce processing demands on them (see also Chapter 11). Here is one strategy: the speaker can start with common words and syntactically simple phrases while planning the rest of the utterance. Another strategy is for the speaker to re-use sentence plans – that is, to favour practised and easy sentence plans. MacDonald’s (2013) key assumption is that listeners’ comprehension processes are sensitive to these strategies, which increases their ability to predict the speaker’s next utterance. In sum, “Rarer patterns [produced by speakers] are more difficult to comprehend than frequent patterns” (Momma & Phillips, 2018, p. 236). Momma and Phillips broadened this

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 468

28/02/20 4:11 PM



469

Language comprehension

approach, arguing that a single mechanism is used in parsing by listeners and utterances produced by speakers.

Findings According to the constraint-based model, several kinds of non-syntactic information are used very early in sentence processing. In contrast, this occurs only after an initial stage of syntactic processing within the garden-­ path model. Much research is more consistent with the constraint-based model. For example, researchers (e.g., Hagoort et  al., 2004) using event-­ related potentials have found sentence processing is influenced very rapidly by semantic factors (discussed further later, pp. 475–476). The constraint-based model assumes sentence processing is parallel whereas the garden-path model assumes it is serial. Cai et  al. (2012) compared the models’ predictions using ambiguous sentences such as the following:

KEY TERM Verb bias An imbalance in the frequency with which a verb is associated with different syntactic structures.

Because it was John that Ralph threatened the neighbour recorded their conversation. This sentence is initially ambiguous because it is unclear whether the neighbour is the subject of the main clause (recorded their conversation: subject analysis) or the object of the preceding verb (threatened: object analysis). Readers interpreted the sentence in line with the subject analysis. However, the object analysis disrupted sentence processing even though it was not adopted. This finding suggests there was parallel processing of the two analyses as predicted by the constraint-based model. According to the model, verbs are an important constraint that often strongly influence initial attempts at parsing. The focus has been especially on verb bias – some verbs (e.g., read ) are associated with two different syntactic structures (but more frequently with one). Consider the following two sentences: (1) The professor read the newspaper had been destroyed. (2) The professor read the newspaper during his break. The second sentence is easier to understand because the verb read is generally followed by a direct object, as in (2). However, it can also be followed by an embedded clause, as in (1). According to the constraint-based model, readers should find it easier to resolve ambiguities (and identify the correct syntactic structure) when the sentence structure is consistent with the verb bias. According to the garden-path model, in contrast, verb bias should have no initial effect. Wilson and Garnsey (2009) studied verb bias in ambiguous sentences. As predicted by the constraint-based model, it took longer to resolve the ambiguity when the sentence structure was inconsistent with the verb bias. Thus, readers’ previous experience with verbs immediately influenced sentence processing. Fine et  al. (2013) asked participants to read sentences such as the following:

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 469

28/02/20 4:11 PM

470 Language

(1) The experienced soldiers warned about the dangers before the midnight raid. (2) The experienced soldiers warned about the dangers conducted the midnight raid. Both sentences are temporarily ambiguous. However, verbs such as warned are far more likely to occur as a main verb, as in sentence (1), than as the verb in a relative clause, as in sentence (2). Accordingly, readers find it much easier to process sentences such as (1) than those such as (2). Fine et  al. (2013) used a condition in which 50% of sentences resembled sentence (1) and 50% resembled sentence (2). Readers rapidly adapted their syntactic expectations so they increasingly read sentences such as (2) with relative ease. The take-home message is that the initial syntactic structure considered by readers is flexible and influenced by recent past experience. This flexibility is entirely consistent with the constraint-based model but not the garden-path model.

Evaluation What are the constraint-based model’s strengths? First, it seems efficient that readers and listeners should use all relevant information from the outset when working out a sentence’s syntactic structure. As we have seen, non-syntactic factors (e.g., word meaning; verb bias) are often used very rapidly. Second, the model predicts much flexibility in parsing because it is influenced by our past linguistic experience. There is strong support for this prediction (e.g., Fine et al., 2013). Brysbaert and Mitchell (1996) found substantial individual differences among Dutch readers in their parsing decisions, providing further evidence of flexibility. What are the model’s limitations? First, its predictions are often imprecise. As Rayner et al. (2012, p. 229) pointed out, “It is difficult . . . to falsify the general claim that parsing is interactive and constraint-based . . . it does not by itself make any clear predictions about which things actually matter, or how and when they have their influence.” Second, much experimental support for the model consists of findings showing that non-syntactic factors influence early sentence processing. Such findings are clearly consistent with the model. However, some can be accounted for by the garden-path model by assuming the second, non-­ syntactic, stage of parsing starts very rapidly.

Unrestricted race model Van Gompel et al. (2000) proposed the unrestricted race model combining aspects of the garden-path and constraint-based models. Here are its main assumptions: (1) All information sources (semantic + syntactic) are used to identify a syntactic structure (consistent with the constraint-based model). (2) All other syntactic structures are ignored unless the favoured syntactic structure is disconfirmed by subsequent information.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 470

28/02/20 4:11 PM



Language comprehension

471

(3) If the initial syntactic structure is discarded, there is an extensive re-analysis to form a new one. This resembles the garden-path model in that parsing often involves two distinct stages.

Findings Van Gompel et  al. (2001) compared the unrestricted race model against other models. Participants read three kinds of sentences (examples provided): (1) Ambiguous sentences: The burglar stabbed only the guy with the dagger during the night. (It could be the burglar or the guy who had the dagger). (2) Verb-phrase attachment: The burglar stabbed only the dog with the dagger during the night. (Here the burglar stabbed with the dagger). (3) Noun-phrase attachment: The burglar stabbed only the dog with the collar during the night. According to the garden-path model, the principle of minimal attachment  means readers should always adopt the verb-phrase analysis. This produces rapid processing of sentences such as (2) but slow processing of sentences such as (3). Ambiguous sentences are processed rapidly because  the verb-phrase analysis is acceptable. According to the constraint-based theory, sentences such as (2) and (3) are processed rapidly because the word meanings support only the correct interpretation. However, there will be competition between the two possible interpretations of sentence (1) and so processing will be slow. What actually happened? There was an ambiguity advantage: ambiguous sentences were processed faster than either of the other sentence types (see Figure 10.1). According to the unrestricted race model, readers rapidly use syntactic and semantic information in ambiguous sentences to form a syntactic structure, and no re-analysis is necessary. In contrast, re-analysis is sometimes required with noun-phrase and verb-phrase sentences. Mohamed and Clifton (2011) compared the same three models. Participants read temporarily ambiguous sentences (e.g., The second wife will claim the entire family inheritance for herself   ). This sentence has Figure 10.1 Total sentence processing time as a function of sentence ambiguous (the entire family inheritance) and type (ambiguous; verb-phrase attachment; noun-phrase disambiguating (for herself  ) regions. The sen- attachment). tence was sometimes preceded by a context Data from van Gompel et al. (2001). Reprinted with permission of Elsevier. biasing the incorrect syntactic structure.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 471

28/02/20 4:11 PM

472 Language

What do the three models predict? Since the actual syntactic structure is the simplest possible, the garden-path model predicts readers will not be slowed down in the ambiguous or disambiguating regions. According to the constraint-based theory, both syntactic structures are activated in the ambiguous region, which slows down reading. Readers then select one syntactic structure in the disambiguating region, which also slows reading time. According to the unrestricted race model, reading is not slowed in the ambiguous region because only one syntactic structure is produced. However, it will often be the incorrect syntactic structure, which slows reading in the disambiguating region. Which model was the winner? Reading times in the ambiguous and disambiguating regions were most consistent with the predictions of the unrestricted race model. According to the unrestricted race model, parsing terminates when a permissible syntactic structure is produced. Logačev and Vasishth (2016) argued that this assumption is too limited because it takes no account of task demands. When they asked readers to construct all possible syntactic structures, there was an ambiguity disadvantage (i.e., ambiguous sentences were processed more slowly than unambiguous ones). This finding is contrary to the unrestricted race model’s prediction.

Evaluation The unrestricted race model combines successful features of the garden-path and constraint-based models. It is reasonable that all information sources are used from the outset, and that the initial syntactic structure is retained unless subsequent evidence is inconsistent with it. It differs from most other models in predicting the surprising finding there can be an ambiguity advantage in sentence processing. What are the model’s limitations? First, it assumes readers and listeners typically identify a sentence’s correct syntactic structure. That is by no means always the case (see below). Second, the model assumes an ambiguity advantage will typically be found. In fact, task conditions determine whether there is an ambiguity advantage or disadvantage.

Good-enough representations Until recently, nearly all theories of sentence processing (including those discussed above) assumed the language processor “generates representations of the linguistic input that are complete, detailed, and accurate” (Ferreira et al., 2002, p. 11). There are two reasons why this assumption is incorrect. First, as discussed earlier, many individuals with low non-verbal IQs have very limited grammatical knowledge (Dąbrowska, 2018). Second, the good-enough processing approach “emphasises the tendency of the comprehension system to perform superficial analyses of linguistic input, which sometimes result in inaccurate interpretations” (Ferreira & Lowder, 2016, p. 218). Karimi and Ferreira (2016) proposed a model of comprehension based on the notion of good-enough representations (see Figure 10.2). It assumes two routes are used in language processing, both starting at the same time.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 472

28/02/20 4:11 PM



Language comprehension

Heuristic route

473 Figure 10.2 A model of language processing involving heuristic and algorithmic routes.

Interim output equilibrium Final output

From Karimi & Ferreira (2016).

Algorithmic route

Interim output refined if necessary

Time

First, the heuristic route uses simple, error-prone heuristics (rules of thumb) and typically produces a rapid output. This route is “quick and dirty” but has the advantage of involving minimal effort. Second, the algorithmic route is more demanding on resources – it uses strict and well-defined syntactic rules “to compute precise representations for the given linguistic input” (Karimi & Ferreira, 2016, p. 1014). What are the model’s implications? First, listeners and readers generally accept the output of the heuristic route as correct. Comprehenders emphasise heuristic processing because they have limited cognitive resources and processing time is limited. Second, if the output of the heuristic route is not accepted as correct (or listeners and readers strive for high levels of comprehension accuracy), algorithmic processing continues and typically determines the outcome of the comprehension process. Third, individuals with poor comprehension skills are less likely than those with good comprehension skills to make effective use of algorithmic processing.

Findings As predicted by the model, comprehension processes are often superficial and inaccurate. For example, consider the Moses illusion. When asked “How many animals of each sort did Moses put on the ark?”, approximately 50% of people reply “Two”. In fact, the correct answer is “None” (think about it!). The Moses illusion occurs because of superficial or heuristic processing. Successful avoidance of the Moses illusion requires more thorough processing to inhibit the outcome of heuristic processing (Raposo & Marques, 2013). In similar fashion, Ferreira (2003) found listeners who heard “The mouse was eaten by the cheese” sometimes misinterpreted it as meaning the mouse ate the cheese! Ferreira argued this was due to a common heuristic (the noun-verb-noun or NVN strategy). This involves the assumption that the subject of a sentence is the agent of some action whereas the object is

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 473

28/02/20 4:11 PM

474 Language

the recipient. We use this heuristic because most English sentences conform to this pattern. Christianson et  al. (2010) argued that listeners in the Ferreira (2003) study faced a conflict between the syntactic structure of the passive sentences and their semantic knowledge of what is typically the case. They found listeners hearing implausible passive sentences (e.g., “The angler was caught by the fish”) paid little attention to their syntactic structure. Swets et al. (2008) argued readers would engage in increased algorithmic processing if they anticipated detailed (rather than superficial) comprehension questions. Participants read sentences more slowly in the former case (see Figure 10.3). Ambiguous sentences were read more rapidly than unambiguous ones when superficial questions were asked (suggesting heuristic processing). However, this ambiguity advantage disappeared when more challenging comprehension questions were anticipated (suggesting more algorithmic processing). Reliance on heuristic processing can be so great that readers fail to repair their preferred syntactic structure of a sentence even when inadequate. Ferreira and Lowder (2016) discussed studies where readers received sentences such as “While Anna bathed the baby played in the crib”. Many readers mistakenly understood this sentence to mean Anna bathed the baby and the baby played in the crib. What happened was many readers initially assumed Anna bathed the baby and maintained this assumption even though this structure breaks down when they reach the verb played which then has no subject. Finally, we consider individual differences. Individuals high in working memory capacity (high intelligence and attentional control; see Glossary) answered comprehension questions about garden-path sentences 70%

Figure 10.3 Sentence reading times as a function of the way in which comprehension was assessed: detailed (relative clause) questions; superficial questions on all trials; or occasional superficial questions. Sample sentence: The maid of the princess who scratched herself in public was terribly humiliated. From Swets et al. (2008). With kind permission from Springer Science+Business Media.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 474

28/02/20 4:11 PM



Language comprehension

475

of the time. In contrast, the comparable figure for those low in working memory capacity was 50% (chance performance) (MacDonald et al., 1992).

Evaluation Sentence comprehension can depend on precise algorithmic processes or imprecise heuristic ones. As predicted by the model, language processing often uses good-enough representations and so is error-prone. It follows from the model that language processing should be flexible. As predicted, there is more evidence of algorithmic processing when readers or listeners have high IQ or working memory capacity or when they are expecting detailed comprehension questions. What are the model’s limitations? First, as Karimi and Ferreira (2016, p. 1019) admitted, “The nature of the simple rules that guide heuristic processing is unclear”. Second, heuristic processing can involve very limited processing or top-down semantic processing (e.g., the Moses illusion). It is not clear all forms of heuristic processing (especially top-down semantic processing) are relatively effortless (Koornneef & Reuland, 2016). Third, it is often assumed theoretically that re-analysis of ambiguous sentences using precise or algorithmic processes reduces misinterpretations. According to this viewpoint, readers spending the most time processing the disambiguating region of ambiguous sentences should be less likely to misinterpret them. However, Qian et al. (2018; discussed above, p. 467), found that spending extra time processing the disambiguating region (and so presumably engaging in re-analysis) was ineffective when events described in the misinterpretation seemed highly probable. Fourth, it is assumed within the model that misinterpretations of sentences such as “The mouse was eaten by the cheese” occur because heuristic processing produces incorrect syntactic representations. However, there is another possibility. Perhaps listeners/readers typically form correct syntactic representations with misinterpretations due to memory limitations (i.e., incomplete retrieval of relevant information) (Bader & Meng, 2018). The crucial point is that misinterpretation errors may reflect processes (e.g., involving memory) occurring after an initial correct sentence interpretation.

Cognitive neuroscience: event-related potentials Cognitive neuroscience has enhanced our understanding of parsing and sentence comprehension. Since the precise timing of different processes is so important, much use has been made of event-related potentials (ERPs; see Glossary). As we will see, semantic information of various kinds is actively processed very early on, which is broadly consistent with predictions from the constraint-based and unrestricted race models. The literature is reviewed by Beres (2017). The N400 component in the ERP waveform is of special relevance. It is a negative wave with an onset at 250 ms and peak at 400 ms. The N400 to a sentence word is smaller when its meaning matches the sentence context. Other factors influencing N400 during sentence processing mostly relate to semantic processing. As a result, N400 has often been assumed to reflect difficulty with achieving semantic access. However, it is more likely

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 475

28/02/20 4:11 PM

476 Language

that N400 reflects “the input-driven update of a representation of sentence meaning” (Rabovsky et al., 2018, p. 693). As we will see, research within cognitive neuroscience has provided evidence for top-down predictive processes in sentence processing. There is further discussion of such predictive processes with respect to reading and speech perception in Chapter 9.

Findings How does meaning influence initial sentence processing? The traditional view was that initially we process only word meanings with aspects of meaning going beyond the sentence itself (e.g., our world knowledge) processed subsequently. Hagoort et  al. (2004) reported contrary evidence. Dutch participants read sentences such as the following (critical words are in italics): (1) The Dutch trains are yellow and very crowded. (This sentence is true). (2) The Dutch trains are sour and very crowded. (This sentence is false because of the meaning of the word “sour”). (3) The Dutch trains are white and very crowded. (This sentence is false because of world knowledge – Dutch trains are yellow). According to the traditional view, the semantic mismatch in a sentence such as (3) should have taken longer to detect than the mismatch in a sentence such as (2). However, the effects of these different kinds of semantic mismatch on N400 were very similar (see Figure 10.4). What do the above findings mean? First, “While reading a sentence, the brain retrieves and integrates word meanings and world knowledge at the same time” (Hagoort et  al., 2004, p. 440). Thus, the traditional view that we process word meaning before information about world knowledge may be wrong. Second, word meaning and world knowledge are both integrated into the reader’s sentence comprehension within about 400 ms. This suggests sentence processing involves making almost immediate Figure 10.4 The N400 response to the critical word in a correct sentence (“The Dutch trains are yellow”: green line), a sentence incorrect on the basis of world knowledge (“The Dutch trains are white”: orange line) and a sentence incorrect on the basis of word meanings (“The Dutch trains are sour”: purple line). The N400 response was very similar with both incorrect sentences. From Hagoort et al. (2004). Reprinted with permission from AAAS.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 476

28/02/20 4:11 PM



Language comprehension

477

use of all relevant information, consistent with MacDonald et  al.’s (1994) ­constraint-based theory. The traditional view also assumed contextual information is processed after information about word meanings. Contrary evidence was reported by Nieuwland and van Berkum (2006a, p. 1106) using scenarios such as this one: A woman saw a dancing peanut who had a big smile on his face. The peanut was singing about a girl he had just met. And judging from the song, the peanut was totally crazy about her. The woman thought it was really cute to see the peanut singing and dancing like that. The peanut was salted/in love, and by the sound of it, this was definitely mutual. Some listeners heard “salted ”, which was appropriate in terms of word meanings but inappropriate within the story context. Others heard “in love”, which was appropriate within the story context but not word meanings. The N400 was greater for “salted ” than “in love” because it did not fit the story context. Thus, contextual information can have a very rapid major impact. Van den Brink et al. (2012) argued that listeners take rapid account of stereotyped inferences about the speaker. For example, suppose you heard a woman say “I have a large tattoo on my back”. This would conflict with stereotypical views if she had an upper-class accent but not if she had a working-class accent. As predicted, there was a larger N400 to the word “tattoo” when spoken in an upper-class accent.

Evaluation Behavioural measures (e.g., time to read a sentence) provide only indirect evidence concerning the nature and timing of underlying language processes. In contrast, research using event-related potentials indicates listeners make use of several kinds of information (e.g., context; world knowledge; knowledge of the speaker; syntax) very early in processing, before the end of each spoken word. Such findings are more supportive of constraint-based theories than the garden-path model. How can we explain the above findings? According to Hagoort (2017, p. 200), Very likely, lexical, semantic and syntactic cues conspire to predict characteristics of the next anticipated word, including its syntactic and semantic make-up. A mismatch between contextual prediction and the output of bottom-up analysis results in an immediate brain response recruiting additional processing resources for the sake of salvaging the on-line interpretation process. More research using event-related potentials to assess the extent to which readers/listeners predict upcoming text is discussed in the section entitled “Discourse processes: inferences” (see pp. 490–498). What are the limitations of research in this area? First, most research is artificial because sentences are presented word-by-word to stop eye

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 477

28/02/20 4:11 PM

478 Language

KEY TERMS Pragmatics The study of the ways language is used and understood in the real world including a consideration of its intended meaning; in general, the impact of contextual factors on meaning. Figurative language Language that is not intended to be taken literally; examples include metaphor, irony and idiom. Autism spectrum disorder (ASD) A disorder involving difficulties in social interaction and communication and repetitive patterns of behaviour and thinking. Central coherence The ability to make use of all the information when interpreting an utterance or situation. Asperger syndrome An autism spectrum disorder involving problems with social communication in spite of at least average intelligence and no delays in language development.

movements contaminating the ERPs. Findings are generally similar in word-by-word and free reading. However, comprehension is better in free reading because it permits regressions (eyes moving backwards in the text) (Metzner et  al., 2017). These regressions are associated with a P600 effect (an ERP component produced by syntactic and semantic violations in the text). Second, much research differs from naturalistic language comprehension in important ways. The latter more often involves processes not specific to language (e.g., relating text to pre-existing knowledge and to context) whereas the former is generally concerned primarily with language processing (Hasson et al., 2018). Third, a small N400 to a predictable word in a sentence may indicate successful prediction. Alternatively, however, it might also indicate easy integration of that word into the developing sentence meaning.

PRAGMATICS Pragmatics is concerned with practical language use and comprehension.

It relates to the intended rather than literal meaning expressed by speakers and understood by listeners and often involves drawing inferences. For example, we assume someone who says “The weather’s really great!”, when it has been raining non-stop for several days, actually thinks the weather is terrible. Pragmatics is also important when readers comprehend text. Pragmatics is “meaning minus semantics”. Suppose someone says something in an unfamiliar language. Using a dictionary would partly clarify what the speaker intended to communicate. Most of what the dictionary (plus knowledge of the language’s grammatical structure) fails to tell you about the speaker’s intended meaning lies within the field of pragmatics. A full understanding of intended meaning generally requires taking account of contextual information (e.g., the speaker’s tone; the speaker’s relevant behaviour; the current environment). An important area within pragmatics is figurative language (language not intended to be taken literally). Metaphor is figurative language where a word or phrase is used figuratively to mean something it resembles (e.g., “Time is a thief”). There is also irony where the intended meaning differs substantially from the literal meaning. Here is an example from the film Dr Strangelove: “Gentlemen, you can’t fight in here! This is the War Room.” There are also idioms, which are common figurative expressions (e.g., “kick the bucket”. Bohrn et al. (2012) carried out a meta-analysis (see Glossary) comparing brain activation with figurative and literal language processing. There were two main findings: (1) Figurative language processing involves essentially the same brain network as literal processing. (2) Several areas in (and close) to the inferior frontal gyrus (BA45/36/47/13) (especially in the left hemisphere) were more activated during figurative than literal language processing. Häuser et  al. (2016) applied repetitive transcranial magnetic stimulation (rTMS; see Glossary) to part of this network (BA45), which they hypothesised provides

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 478

28/02/20 4:11 PM



Language comprehension

479

cognitive control to resolve semantic conflicts. As predicted, rTMS impaired the processing of idioms involving maximal semantic conflict between literal and idiomatic meanings.

IN THE REAL WORLD: AUTISTIC SPECTRUM DISORDERS AND PRAGMATICS We can see the importance of pragmatics by studying individuals who have difficulty in distinguishing between literal and intended meanings. For example, individuals with autism spectrum disorder (ASD) are poor at understanding others’ intentions and beliefs and so find social communication very hard. They also have weak central coherence (the ability to integrate information from different sources). It follows that individuals with autism spectrum disorder should have severe problems understanding the intended meanings of figurative language. Much evidence has been obtained from individuals with Asperger syndrome (relatively mild ASD). Children with Asperger’s often develop language normally but have impaired pragmatic language comprehension (see Volden, 2017, for a review). For example, Kaland et al. (2005) found they were deficient at drawing inferences when presented with jokes, white lies, figurative language or irony. Here is an example involving irony: Ann’s mother has spent a long time cooking Ann’s favourite meal: fish and chips. But when she brings it to Ann, she is watching TV, and she doesn’t even say thank you. Ann’s mother is cross and says, “Well, that’s very nice, isn’t it! That is what I call politeness!” Individuals with Asperger syndrome were less able than healthy controls to explain why Ann’s mother said what she did. This illustrates their general inability to understand what other people are thinking. Of importance, Asperger’s individuals were comparable to controls when drawing inferences not requiring social understanding. Loukusa et  al. (2018) extended the findings of Kaland et  al. (2005). Two factors were jointly responsible for the impaired ability of children with ASD to draw correct pragmatic inferences during comprehension. First, they had problems taking account of the context. Second, they found it hard to infer someone’s thoughts and feelings from what that person said. The deficit in correct pragmatic inferences by ASD children went from 25% when only context was important to 48% when someone’s else thoughts and feeling were also important. As mentioned already, deficient pragmatic language comprehension in individuals with Asperger syndrome is partly due to weak central coherence. Zalla et al. (2014) asked participants to decide whether a speaker’s compliments to another person were literal or ironic. Healthy controls correctly recognised a speaker was being ironic if they had a sarcastic/ironic occupation (e.g., comedian; chat show host). In contrast, individuals with Asperger’s typically ignored information about the speaker’s occupation. Language impairments in autism spectrum disorder are not always specific to pragmatic language. Whyte and Nelson (2015) found children with ASD also had poorer knowledge of syntax and vocabulary. These language deficits mostly explained their impaired pragmatic language comprehension. In sum, individuals with ASD have impaired pragmatic language comprehension especially when they need to take account of context and someone else’s thoughts and feelings. Their great difficulty in inferring others’ intentions and motivations from what they say and how they behave plays a significant role in restricting their social horizons.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 479

28/02/20 4:11 PM

480 Language

KEY TERMS Metaphor interference effect The finding that it takes longer to judge whether metaphorical sentences are literally true or false than control sentences.

Figurative language: metaphors The central problem readers (and listeners) have with metaphors is that they have separate literal and non-literal or metaphorical meanings. For example, consider the unfamiliar metaphor “My mother says envy is rust” (George & Wiley, 2016). The reader (or listener) has to ignore the relatively meaningless literal meaning and identify the metaphorical meaning (i.e., envy is like rust because it is corrosive). Olkoniemi et  al. (2016) obtained evidence suggesting that metaphor comprehension can be relatively demanding. Readers low in working memory capacity (associated with low intelligence; see Glossary) required more processing time to comprehend metaphors. Theoretical approaches to metaphor comprehension are reviewed in detail by Holyoak and Stamenković (2018). According to the traditional standard pragmatic model (e.g., Grice, 1975), three sequential stages are involved in processing metaphorical and other figurative statements: (1) the literal meaning is accessed; (2) the reader or listener decides whether the literal meaning makes sense in the current context; (3) if the literal meaning is inadequate, there is a search for a suitable non-literal meaning. This model is oversimplified. Suppose we ask people to decide whether sentences are literally true or false. According to the model, they should not access the figurative meanings of metaphors on this task and so should respond rapidly. However, that is not the case. Chouinard et  al. (2018) found participants took longer to decide whether metaphorical sentences were literally true or false than when judging control sentences (literally false; scrambled metaphor, e.g., “Some cats are ribbons”; see Figure 10.5). This is the metaphor interference effect. Why is the metaphor interference effect important? As Chouinard et al. (2018, p. 14) concluded, it shows “metaphorical and literal meanings are generated automatically and simultaneously during comprehension”.

(a) 1550

*+ +

1450 1400 1350 1300

* *

Milliseconds

Figure 10.5 Response times for literally false (LF), scrambled metaphor (SM) and metaphor (M) sentences in (a) written and (b) spoken conditions.

Milliseconds

1500

(b) 2280 2260 2240 2220 2200 2180 2160 2140 2120 2100 2080

LF

SM Sentence type

M

LF

SM

M

Sentence type

From Chouinard et al. (2018).

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 480

28/02/20 4:11 PM



Language comprehension

481

Several theorists (e.g., Barsalou, 2012) have argued that sensory experience is relevant to the processing of metaphors and other forms of l­anguage (see Chapter 7). Lacey et  al. (2017) tested this viewpoint. Participants were presented with metaphorical (e.g., “He had to foot the bill”) and literal (e.g., “He had to pay the bill”). All the metaphorical sentences referred to body parts. The key finding was that brain areas responsive to images of body parts were activated only by the metaphorical sentences. These findings indicate that comprehension of metaphors can be perceptually grounded.

Predication model Kintsch (2000) proposed a predication model of metaphor comprehension involving two components: (1) The latent semantic analysis component: this represents word meanings based on their relations with other words. Kintsch (2000) speculated that metaphor comprehension is facilitated when both nouns in a metaphor (e.g., “Lawyers are sharks”) have strong semantic relationships to numerous other words because that facilitates the task of establishing connections between them. (2) The construction-integration component: this uses information from the first component to construct interpretations of statements. Consider the statement “Lawyers are sharks”. It has an argument (lawyers) and a predicate or assertion (sharks). This component selects predicate features relevant to the argument (e.g., vicious; aggressive) and inhibits irrelevant predicate features (e.g., have fins; swim). Wolff and Gentner (2011) agreed with Kintsch (2000) that metaphors involve a directional process with information from the argument (e.g., lawyers) being projected on to the predicate (e.g., sharks). However, they also argued this directional process is preceded by a non-directional process identifying commonalities in meaning between the argument and predicate.

Findings The non-reversibility of metaphors is an important phenomenon. For example, “My surgeon is a butcher” means something very different to “My butcher is a surgeon”. Kintsch’s (2000) model explains non-reversibility by assuming only those features of the predicate (second noun) relevant to the argument (first noun) are selected. Thus, changing the argument changes the features selected. Suppose we try to understand a metaphor such as “My lawyer was a shark”. According to Kintsch’s model, this should be harder to understand when literal properties of sharks (e.g., can swim) irrelevant to its metaphorical meaning have recently been activated. McGlone and Manfredi (2001) found (as predicted by the model) that the above metaphor took longer to understand when preceded by a contextual sentence emphasising the literal meaning of shark (e.g., “Sharks can swim”).

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 481

28/02/20 4:11 PM

482 Language

Figure 10.6 Mean reaction times to verify metaphor-relevant (REL) and metaphorirrelevant (IRR) properties.

IRR

REL

** 1400 Property-verification RT (ms)

From Solomon & ThompsonSchill (2017). Reprinted with permission of Elsevier.

1300

1200

1100

1000 LIT

MET

According to the predication model, understanding metaphors involves inhibiting the semantic properties of the predicate irrelevant to the argument. Solomon and Thompson-Schill (2017) tested this assumption. Participants saw metaphors (e.g., “The prisoners are sardines”) and literal sentences (e.g., “The fish are sardines”). After that, participants decided whether a metaphor-relevant property (e.g., canned) or metaphor-­irrelevant property (e.g., salty) was true of the last word in the sentence (e.g., sardines). Participants verified object properties more slowly following a metaphorical sentence (the MET condition in Figure 10.6) compared to a literal one (the LIT condition in Figure 10.6). Thus, participants inhibited metaphor-­irrelevant information while reading metaphorical sentences. Carriedo et  al. (2016) investigated the effects of individual differences in inhibitory processes on metaphor comprehension. As predicted, individuals having superior inhibitory processes exhibit the best metaphor comprehension. According to Kintsch (2000), metaphor comprehension should be greater when both nouns in a metaphor are similar in meaning to numerous other words. However, Al-Azary and Buchanan (2017) obtained the opposite findings. They speculated that the activation of numerous semantically similar words might make it harder to find shared meanings between the two nouns. According to Wolff and Gentner (2011), initial processing of metaphors involves a non-directional process focusing on finding overlapping meanings between the argument and predicate. This process is the same whether participants see forward metaphors (e.g., Some giraffes are skyscrapers) or reversed metaphors (e.g., Some skyscrapers are giraffes). It follows that rated comprehensibility should be the same for forward and reversed metaphors if participants must respond rapidly. In contrast, comprehensibility rating should be much higher for forward than reversed metaphors if participants have sufficient time for thorough processing. The predicted findings were obtained (see Figure 10.7).

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 482

28/02/20 4:11 PM



Language comprehension

483 Figure 10.7 Mean proportion of statements rated comprehensible with a response deadline of 500 or 1,600 ms. There were four statement types: literal; forward metaphors; reversed metaphors; and scrambled metaphors. From Wolff and Gentner (2011).

Evaluation What are the strengths of research in this area? First, it has been established that metaphor processing depends on many factors, including the listener’s language ability, the familiarity of the metaphor, and the listener’s goal (e.g., understanding a metaphor; judging its appropriateness in context) (Gibbs, 2013). Second, findings indicate that literal and metaphorical meanings are processed simultaneously. Third, inhibitory processes play a key role in diminishing the impact of irrelevant information. Fourth, metaphor comprehension involves a non-directional process followed by a directional one. What are the limitations of research in this area? First, insufficient attention has been paid to possible processing differences between different types of metaphors. For example, we can distinguish between “A is B” metaphors and correlation metaphors (Gibbs, 2013). “Lawyers are sharks” is an example of the former whereas “My research is off to a great start” is an example of the latter. Kintsch’s (2000) prediction model is more applicable to the former type of metaphor than the latter. Second, most research has not distinguished clearly between novel and familiar metaphors. George and Wiley (2016) found participants took longer to think of interpretations of novel than familiar metaphors (7.6 seconds vs 4.9 seconds, respectively). Of importance, inhibitory processes were used less often with familiar metaphors, perhaps because overall ­processing demands were much less.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 483

28/02/20 4:11 PM

484 Language

KEY TERMS Common ground Shared knowledge and beliefs possessed by a speaker and a listener; its use facilitates communication. Egocentric heuristic A strategy used by listeners in which they interpret what they hear based on their own knowledge rather than knowledge shared with the speaker.

Common ground Grice (1975) argued that speakers and listeners generally conform to the cooperative principle – they work together to ensure mutual understanding. Of direct, relevance, speakers and listeners need to take account of the common ground, which “describes a body of information that people allegedly share” (Cowley and Harvey, 2016, p. 56). Listeners expect speakers to refer mostly to information and knowledge that falls in the common ground and often experience difficulties if that is not the case. The extent to which that expectation is correct is discussed in Chapter 11 (see pp.  ­544–547). Note that a major goal of conversation is to extend the common ground between those involved (Brown-Schmidt & Heller, 2014). Keysar et  al. (2000) accepted listeners would benefit from using the common ground existing between them and the speaker. However, this can be very effortful for listeners, and so they generally resort to a rapid and non-effortful egocentric heuristic. The egocentric heuristic is “a tendency to consider as potential referents objects that are not in the common ground, but are potential referents from one’s own perspective” (Keysar et al., 2000, p. 32). Use of the egocentric heuristic will often cause listeners to misunderstand the speaker’s message. Accordingly, Keysar argued that listeners sometimes follow use of the egocentric heuristic with an effortful process of trying to adopt the speaker’s perspective. Several theorists (e.g., Bezuidenhout, 2014) have disagreed that listeners typically make use of the egocentric heuristic. Instead, they argue listeners generally take account of the common ground very early in processing. Heller et al. (2016) claimed it is simplistic to assume listeners adopt a single perspective (egocentric or that of the common ground). Instead, listeners use both perspectives simultaneously.

Findings In Keysar’s research (e.g., Keysar et al., 2000), listeners often used the egocentric heuristic and ignored the common ground. However, many studies (e.g., Heller et  al., 2016, discussed shortly) have found the opposite. How can we resolve these inconsistencies? Dębska and Rączaszek-Leonardi (2018) argued that Keysar’s approach was biased. Suppose listeners were instructed to “Put the small candle . . .” from an array containing three candles (big, medium and small). The candle hidden from the speaker was always the smallest one. Thus, one reason why listeners used the egocentric heuristic by selecting the smallest candle was because it was the one best described by the instructions. Dębska and Rączaszek-Leonardi (2018) tested the above ideas by using a set-up resembling that of Keysar. Listeners showed less evidence of the egocentric heuristic when the object hidden from the speaker was not the one best described by the instructions. Heller et  al. (2016) tested the various theories mentioned earlier. Figure  10.8 illustrates their four conditions viewing the display from the listener’s perspective when the task was to move the big candle (inside the white oval). In the baseline conditions, all four objects were in common ground. In the crucial privileged conditions, one object was visible only to

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 484

28/02/20 4:11 PM



Language comprehension

485 Figure 10.8 Sample displays seen from the listener’s perspective; instructions were to “pick up the big candle”; the target is within the white oval. From Heller et al. (2016). Reprinted with permission of Elsevier.

the listener. In the pairs conditions, two pairs of objects differed in size, and in the triplet conditions, three similar objects differed in size and there was also a completely different object. Eye-tracking assessed listeners’ attentional focus. What would we predict for the crucial privileged conditions? We start with the privileged triplet condition. If listeners used the egocentric heuristic, they would mistakenly focus on the candle the speaker could not see (bottom left). If they used common ground information, in contrast, they would focus on the larger of the two candles the speaker could see (bottom right). There was some evidence for the egocentric heuristic because listeners focused to some extent on the candle the speaker could not see (privileged big candle; see Figure 10.9). However, common ground information was also used – listeners consistently fixated the target rapidly and to a much greater extent than any other object. The privileged triplet condition is biased to elicit the egocentric heuristic in that the object only the listener could see fitted the speaker’s instructions better than the intended target. This is not the case in the privileged pairs condition. In this condition, use of the egocentric heuristic would lead to equal fixations on the big funnel and the big candle (the target) when the speaker has said “Pick up the big . . .”, but has not yet said “candle”. In contrast, use of the common ground would cause fixations to be allocated to the big candle rather than the big funnel before the speaker says “candle”. In the privileged pairs condition, listeners used the common ground – they attended more to the big candle than the big funnel faster than in the

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 485

28/02/20 4:11 PM

486 Language

Pairs

0.3

900

1000

1100

1200

1000

1100

1200

800

Medium candle (target) PRIVILEGED big candle Small candle

0.8 0.7

Small funnel (PRIVILEGED)

0.6

Proportions of fixations

0.5 0.4 0.3 0.2 0.1

0.5 0.4 0.3 0.2 0.1

Time (ms) after ADJ onset

Figure 10.9 Proportion of fixation on the four objects over time; 0 ms = onset of adjective “big” and the shaded area covers processing of the adjective. From Heller et al. (2016). Reprinted with permission of Elsevier.

800

700

600

500

400

300

200

100

0

–100

1200

1100

1000

900

800

700

600

500

400

300

200

100

0

–100

0 –200

0

Funnel

0.6

–200

Privileged

0.7

Proportions of fixations

Time (ms) after ADJ onset 0.9

Big candle (target) Small candle Big funnel (competitor)

0.8

900

Time (ms) after ADJ onset 0.9

700

–200

1200

1100

900

1000

800

700

600

500

400

300

200

0

100

0 –100

0.1

0

600

0.2

0.1

500

0.2

0.4

400

0.3

0.5

300

0.4

0.6

200

0.5

0

Proportions of fixations

0.7

0.6

–200

Baseline

0.7

Big candle (target) Medium candle Small candle Funnel

0.8

100

Big candle (target) Small candle Big funnel (competitor) Small funnel

0.8

Proportions of fixations

Triplet 0.9

–100

0.9

Time (ms) after ADJ onset

baseline conditions (200 ms after the adjective “big” was presented versus 350 ms). However, there was some evidence of the egocentric heuristic  – listeners had some fixations on the object only they could see (i.e., the small funnel) and also on the irrelevant big funnel. In sum, Heller et al.’s (2016) findings indicate that listeners rapidly use common ground. Of greatest importance, their findings are most consistent with the theory that listeners make simultaneous use of an egocentric perspective and common ground. Suppose listeners who were given a task resembling the privileged triplet condition in Heller et  al.’s (2016) study performed a demanding second task at the same time. According to Keysar et  al.’s (2000) theory, this should increase their use of the egocentric heuristic (compared to a control condition with an undemanding second task) because they would lack the processing resources to make use of the common ground. Lin et al. (2010) carried out a study along those lines and obtained the predicted findings. Cane et  al. (2018) obtained similar findings using the demanding task of remembering a sequence of five digits. Luk et  al. (2012) studied cultural differences in use of the egocentric heuristic. Chinese–English bilinguals were primed to focus on the Chinese or American culture. Only 5% of those focusing on the Chinese culture used the egocentric heuristic on a listening task compared to 45% focusing on the American culture. These findings are consistent with the common

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 486

28/02/20 4:11 PM



Language comprehension

487

assumption that Western cultures are more individualistic and self-focused than Eastern cultures (which are more collectivistic and group-centred). Most research has focused only on whether specific pieces of information are in common ground. This ignores the potential richness of common ground representations, which can include cultural and community information shared by speaker and listener. Brown-Schmidt (2012) used a task where two individuals worked together to move various game pieces. Their interactive discussions led to the formation and maintenance of rich common ground representations. Brown-Schmidt also found that the assumption that a given piece of information is or is not in common ground between a speaker and listener is oversimplified. In fact, a given piece of information can be in common ground to a greater or lesser extent. Nearly all research in this area is limited because speakers and ­listeners are strangers to each other. We might assume friends share more common ground than strangers and so rely less on the egocentric heuristic. However, Savitsky et  al. (2011) obtained the opposite finding because friends over­ estimated how well they communicated with each other.

Evaluation The evidence suggests listeners make simultaneous use of both their egocentric perspective and common ground. However, several factors influence the relative importance of these two perspectives. First, the egocentric perspective is used more often when the object hidden from the speaker is the one best described by the instructions. Second, the egocentric perspective is more frequent when listeners have limited processing resources avail­able. Second, it is used more often in Western cultures than in Eastern ones. Third, the egocentric perspective may be used more often by listeners when the speaker is a friend of theirs rather than a stranger. What are the limitations of research in this area? First, most research has focused on very specific aspects of common ground. Second, findings from listener–speaker pairs who are strangers may not generalise to pairs who are friends. Third, many studies lack ecological validity (see Glossary): it is rare in everyday life for an object between two individuals to be visible to the listener but not to the speaker. Fourth, in most research, the participants act only as listeners. In contrast, real-life conversations involve rapid switching between listening and speaking. In such situations, it is often useful for listeners to focus on information available only to them (and thus not in the common ground) so they can communicate it to the other person (Mozuraitis et al., 2015).

INDIVIDUAL DIFFERENCES: WORKING MEMORY CAPACITY There are considerable individual differences in almost all complex cognitive activities. Accordingly, theories based on the assumption (explicit or implicit) that everyone comprehends text similarly are oversimplified. What are the most important individual differences influencing reading performance? Just and Carpenter (1992) emphasised individual differences in

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 487

28/02/20 4:11 PM

488 Language

IN THE REAL WORLD: UNDERSTANDING NON-NATIVE SPEAKERS As Ryskin et al. (2018, p. 141) pointed out, “Everyday language use occurs amid myriad sources of noise”. For example, non-native speakers may make errors because of deficient knowledge of the language including numerous mispronunciations when engaged in conversation (Levis & Barriuso, 2011). In such circumstances, listeners have to infer the intended meaning from what is actually said. How do we cope when trying to understand what a non-native speaker is saying? A crucial part of the answer was provided by Lev-Ari (2014) in a study where a native speaker of Mandarin or of English gave instructions in English to native English speakers. Listeners to the non-native speaker increased their reliance on top-down processes (e.g., predicting what the speaker would say next) and reduced their reliance on what the speaker said. This strategy is entirely appropriate given the lower language competence of the non-native speaker. Gibson et  al. (2017) also found listeners relied less on the actual words spoken by non-native speakers and focused more on the intended meaning. Native and non-native speakers produced many utterances, some of which were implausible (e.g., “The tax law benefitted from the businessman”). Listeners were more likely to interpret such implausible utterances as plausible (e.g., “The businessman benefited from the tax law”) when spoken by a non-native speaker. This makes sense given the assumption that non-native speakers are more likely to put words in the wrong order. Suppose a listener is exposed to the utterances of a non-native speaker whose errors consist mainly of deletions (e.g., “We had nice time at the beach”) or insertions (e.g., “The earthquake shattered from the house”). Listeners might simply assume in both cases that the speaker makes many errors across the board. Alternatively, they might assume the speaker only has a high probability of making specific speech errors (e.g., deletions or insertions). Ryskin et al. (2018) found that listeners’ inferences about the speaker’s intended meaning were influenced by the specific errors they had heard previously. Thus, listeners are sensitive to fine-grained information about the types of errors made by speakers.

working memory capacity (the ability to process and store information at the same time) (see Glossary and Chapter 6). Engle and Kane (2004) proposed an influential theory according to which individuals with high working memory capacity have superior executive attention or attentional control than low-capacity individuals. This manifests itself in the superior monitoring of task goals and the ability to resolve response competition. It follows that high-capacity individuals should have less mind-wandering (task-unrelated thoughts) than low-capacity ones while engaged in reading comprehension. As predicted, Unsworth and McMillan (2013) found high-capacity individ­ uals had superior reading comprehension partly because of their reduced mind-wandering. There are two key theoretical issues relating to the effects of working memory capacity on reading comprehension. First, we can focus on relatively specific individual differences in working memory capacity (e.g., verbal working memory involving simultaneous processing and storage of verbal information). An example is reading span (see Glossary). Alternatively, we can focus on general individual differences in working memory capacity (working memory involving simultaneous processing and storage of different kinds of information). An example is operation span

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 488

28/02/20 4:11 PM



Language comprehension

489

which involves numerical processing and verbal storage (see Glossary). Are specific or general aspects of working memory capacity more important in predicting reading comprehension? Second, there are two possible types of explanation for positive correlations between working memory capacity and reading comprehension. Just and Carpenter (1992) assumed there was a direct relationship: individuals with low working memory capacity have more limited processing resources than high-capacity individuals and this directly impairs their reading comprehension. Alternatively, there may be an indirect relationship: the effects of working memory capacity on reading comprehension may occur because it correlates with other reading-relevant factors (e.g., vocabulary; reading experience). Why does it matter whether the relationship is direct or indirect? In essence, if the relationship between working memory capacity and reading is indirect, it implies factors other than working memory capacity itself are primarily responsible for its effects on reading performance.

Findings Peng et  al. (2018) reported a meta-analytic review based on 197 studies. Overall, they reported a correlation of +.29 between working memory capacity and reading. Measures of general working memory capacity correlated +.26 with reading comprehension. Among specific measures, the correlation between verbal working memory capacity and reading was somewhat higher (+.32). Thus, general and specific individual differences in working memory capacity are both important predictors of reading performance. Peng et  al. (2018) also addressed the issue whether the effects of working memory capacity on reading comprehension are direct or indirect. More specifically, two factors they considered were vocabulary size and decoding (“the ability to translate written language into speech with accuracy and/or fluency”, p. 52). They obtained evidence for indirect effects: working memory capacity influenced reading comprehension via its effects on vocabulary and decoding. Freed et al. (2017) also considered whether the effects on reading comprehension of working memory capacity are direct or indirect. They discovered the relationship between working memory capacity and reading comprehension was indirect. It depended on two factors: language experience (e.g., reading habits) and general reasoning ability or fluid intelligence (see Glossary).

Evaluation Theoretical approaches such as that of Just and Carpenter (1992) have the advantage over most language theories in emphasising the importance of individual differences. In contrast, as Kidd et al. (2018, p. 154) pointed out, most theorists regard large individual differences in language comprehension as “an inconvenient truth” which they ignore or de-emphasise. Individual differences in working memory capacity correlate moderately highly with measures of reading comprehension. This is so whether

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 489

28/02/20 4:11 PM

490 Language

KEY TERMS Discourse Language that is a minimum of several sentences in length; it includes written text and connected speech. Logical inferences Inferences that follow necessarily from the meanings of word (e.g., a bachelor is a man who is unmarried). Bridging inferences Inferences or conclusions drawn to increase coherence between the current and preceding parts of a text; also known as backward inferences.

working memory capacity is assessed by relatively specific measures (e.g., verbal working memory) or more general ones. Other research suggesting the importance of working memory capacity is discussed in the next section (see p. 494). What are the limitations of research in this area? First, We have a huge literature . . . that has focused on the role of WMC [working memory capacity] in language processing, based on the assumption that WMC has a unique and direct effect on comprehension. However, only one major study has found such an effect. (Freed et al., 2017, p. 137) Second, and related to the first point, much more research is required to clarify the interrelationships between the numerous individual difference variables correlating with reading comprehension. For example, Van Dyke et  al. (2014) found IQ correlated +.61 with working memory capacity, and that much of the relationship between working memory capacity and reading comprehension depended on IQ.

DISCOURSE PROCESSING: INFERENCES So far we have focused primarily on comprehension of single sentences. In real life, however, we mostly encounter connected discourse (speech or written speech at least several sentences long). Single sentences and discourse differ in various ways. First, single sentences are more likely to be ambiguous because they lack the context provided by previous sentences within discourse. Second, discourse processing typically requires inference drawing for full comprehension. We draw numerous inferences when exposed to discourse (even though we are generally unaware of doing so). Why is so much inference drawing required? Readers and listeners would be bored to tears if writers and speakers spelled everything out in incredible detail. Test your skill at inference drawing with this example taken from Rumelhart and Ortony (1977): Mary heard the ice-cream van coming. She remembered the pocket money. She rushed into the house.

Research activity: Text comprehension

You probably inferred that Mary wanted to buy some ice cream, that buying ice cream costs money, that Mary had some pocket money in the house, and that Mary had only limited time to get hold of some money before the ice-cream van appeared. None of these inferences is explicitly stated. There are several types of inference. First, logical inferences depend only on the meanings of words. For example, we infer that a widow is female. Second, bridging inferences establish coherence between the current part of the text and the preceding text and so are also known as backward inferences.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 490

28/02/20 4:11 PM



491

Language comprehension

Third, elaborative inferences embellish or add details to the text by using world knowledge to expand on textual information. Predictive inferences (or forward inferences) are an important form of elaborative inference. Predictive inferences “allow readers to generate expectations about what will happen next in a text” (Virtue et al., 2017, p. 456). It is hard to work out how we typically access relevant information from our huge store of world knowledge when forming elaborative inferences. The differences between bridging and elaborative inferences are not  always clear-cut. Consider the following scenario (Kuperberg et  al., 2011): Jill had very fair skin. She forgot to put sunscreen on. She had sunburn on Monday. When readers read the second sentence, they could draw the elaborative inference that Jill had sunburn. When they read the third sentence, they could draw the bridging or backward inference that the sunburn has resulted from forgetting to put on sunscreen.

KEY TERMS Elaborative inferences Inferences based on our knowledge of the world that involve adding details to a text that is being read (or speech being listened to). Predictive inferences Expectations concerning what will happen next (e.g., a new event) when reading text or listening to someone. Mental model An internal representation of some possible situation or event in the world having the same structure as that situation or event.

Theoretical perspectives Readers (and listeners) typically draw logical and bridging inferences, which are generally required for full comprehension. However, the number and nature of elaborative inferences (including predictive inferences) drawn remain controversial. Bransford et  al. (1972) in their constructionist approach argued readers typically construct a fairly complete “mental model” of the situation described in a text. They assumed numerous elaborative inferences are drawn during reading even when not essential for comprehension. Several theories of discourse comprehension (including the construction-­ integration model, the event-indexing model, and the event-segmentation theory) involve very similar assumptions (see the later section entitled “Discourse comprehension: theoretical approaches”, pp. 498–510). McKoon and Ratcliff’s (1992) minimalist hypothesis (developed by Gerrig and O’Brien, 2005) assumes far fewer inferences are drawn than does Bransford et al.’s (1972) constructionist approach. This hypothesis is based on the following assumptions: ●● ●●

●●

●●

●●

Research activity: Inferences

Inferences are automatic or strategic (goal-directed). Some automatic inferences establish local coherence (two or three sentences making sense on their own or in combination with easily available general knowledge). These inferences involve parts of the text in working memory at the same time. Other automatic inferences rely on information readily available because it is explicitly stated in the text. Strategic inferences are formed in pursuit of the reader’s goals; they sometimes serve to produce local coherence. Most elaborative inferences are made at recall rather than during reading.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 491

28/02/20 4:11 PM

492 Language

In sum, memory-based theories (e.g., minimalist hypothesis) “rely on a passive and dumb [memory] activation mechanism” (Cook & O’Brien, 2017, p. 2). In contrast, explanation-based theories (e.g., constructionist approach) “assume more interaction between basic memory mechanisms and reader goals and strategies” (Cook & O’Brien, 2017, p. 2). Van den Broek and Helder (2017) provided a theoretical framework combining elements of previous theories (see Figure 10.10). First, there are passive “automatic” processes outside the reader’s conscious control which always occur. These processes resemble those assumed within the minimalist hypothesis. Second, there are effortful reader-initiated processes. The extent of such processes depends on the reader’s standards of coherence: “the criteria that a reader has for what constitutes adequate comprehension and coherence in a particular reading situation” (p. 364). For example, if a reader’s goal includes a search after meaning, they will use more reader-initiated processes and draw more inferences than if that goal is missing (Graesser et al., 1994). These processes correspond to those assumed within the constructionist approach. The central prediction from the above theoretical framework is as follows: When the passive processes alone yield adequate comprehension by attaining the reader’s standards of coherence, then no further processing is necessary. However, if passive processes alone lead to comprehension falling short of satisfying the reader’s standards, then reader-initiated, coherence-building processes are likely. (van den Broek & Helder, 2017, p. 364) Research relevant to the various theoretical approaches discussed above is discussed later (pp. 494–497).

Processes

Product

Passive

Continue reading Figure 10.10 A theoretical framework for reading comprehension involving interacting passive and reader-initiated processes.

Standards of coherence?

Yes

Mental representation of the text

No Reader-initiated

From van den Broek and Helder (2017).

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 492

28/02/20 4:11 PM



493

Language comprehension

Bridging inferences: anaphors An anaphor is a word (e.g., pronoun) referring back to a person or object previously mentioned in a text or speech. Anaphor resolution is a very common form of bridging inference. Here is an example: Fred sold John his lawn mower, and then he sold him his garden hose.

KEY TERM Anaphor A word or phrase that refers back to a previous word or phrase (e.g., a pronoun may refer back to a given individual mentioned earlier).

It requires a bridging inference to realise the referent for “he” is Fred rather than John. How do readers/listeners draw appropriate anaphoric inferences? Gender information can be very helpful. Compare ease of anaphor resolution with the following sentence compared to the one above: Juliet sold John her lawn mower, and then she sold him her garden hose. Anaphor resolution is also facilitated by having pronouns in the expected order. Harley (2013) provided the following example: (1) Vlad sold Dirk his broomstick because he hated it. (2) Vlad sold Dirk his broomstick because he needed it. The first sentence is easy to understand because “he” refers to the firstnamed man (i.e., Vlad). The second sentence is harder to understand because “he” refers to the second-named man (i.e., Dirk). Another factor influencing anaphor resolution is working memory capacity (see Glossary). Nieuwland and van Berkum (2006b) presented sentences containing pronouns whose referents were ambiguous. Readers high in working memory capacity were more likely to take account of both possible referents. When pronouns have only a single possible referent, it has often been assumed readers “automatically” identify the correct one. Love and McKoon (2011) obtained support for this assumption only when readers were highly engaged with the text. Most findings are consistent with Kaiser et al.’s (2009) assumption that anaphor resolution involves multiple constraints (e.g., gender; meaning) operating interactively in parallel. Itzhak and Baum (2015) studied one such constraint (i.e., verb bias) as in the following example: (1) John envied Bill because he was rich. (2) John envied Bill because he was poor. Sentence (1) is easier to comprehend than sentence (2) because we expect the pronoun he to refer to Bill. Itzhak and Baum (2015) argued anaphor resolution would be easier with sentence (2) if the referent of he (i.e., John) were emphasised when the sentence was spoken. That is what they found, thus showing an interaction between verb bias and noun emphasis.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 493

28/02/20 4:11 PM

494 Language

Bridging inferences: more complex inferences Causal inferences are a common form of bridging inference. They require readers to work out the causal relationship between the current sentence and a previous one. Consider the following two sentences: Ken drove to London yesterday. The car kept overheating. You had no trouble (hopefully!) in linking these sentences based on the assumption that Ken drove to London in a car that kept overheating. The above bridging inference may occur because the verb drove in the first sentence activated concepts relating to driving (especially car). Alternatively, readers may form a representation of the situation described in the first sentence and then relate information in the second sentence to it. The crucial difference is that the sentential context is only relevant with the second explanation. Garrod and Terras (2000) identified two stages in forming bridging inferences. The first stage is bonding, a low-level process involving the automatic activation of words from the preceding sentence (explanation one). The second stage is resolution, which ensures the overall interpretation is consistent with the contextual information (explanation two). Resolution is influenced by context but bonding is not. According to the minimalist hypothesis and van den Broek and Helder’s (2017) theoretical framework, the reader’s goals influence which inferences are drawn. Calvo et  al. (2006) gave some participants the goal of reading sentences for comprehension whereas others were explicitly told to anticipate what might happen next. Participants in the latter condition drew more predictive inferences. Even when participants in the former condition drew predictive inferences, they did so more slowly than those in the anticipation condition. Earlier (see pp. 488–490), we discussed how individual differences in working memory capacity influence language comprehension. Such individual differences also influence inference drawing. Barreyro et  al. (2012) found readers with high working memory capacity drew more elaborative causal inferences than did low-capacity readers. Of relevance, there is a moderately high correlation between working memory capacity and IQ (intelligence). However, Christopher et  al. (2012) found working memory capacity still predicted comprehension performance after controlling for intelligence. Murray and Burke (2003) focused on predictive inferences (e.g., inferring break when presented with a sentence such as The angry husband threw the fragile vase against the wall. Only participants with high reading skill drew such inferences “automatically’. In general, individuals with poor reading skills draw fewer inferences than those with good reading skills McKoon & Ratcliff, 2017). In sum, research on individual differences in inference drawing and comprehension ability is important. Any adequate theory of language comprehension (or inference drawing) must provide an explanation for such individual differences.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 494

28/02/20 4:11 PM



Language comprehension

495

IN THE REAL WORLD: ANXIETY AND INFERENCE DRAWING So far we have focused on factors determining whether readers draw inferences. It is also important (but relatively neglected) to consider which inferences are drawn when we read or listen to discourse. For example, suppose we present individuals high and low in trait anxiety (see Glossary) with ambiguous sentences such as the following: With hardly any visibility, the plane quickly approached the dangerous mountain and, at the same time, the passengers began to shout in panic. The plane . . . We might expect that high-anxious individuals would be more likely than low-anxious ones to be biased towards the negative or threatening predictive inference (i.e., the plane crashed). Calvo and Castillo (2001) tested this expectation. After reading the sentence above, participants were presented with the word “crashed” or “swerved” and named it rapidly. What did Calvo and Castillo (2001) find? When the time interval between “The plane . . .” and the word was 1,500 ms, high-anxious individuals named the word “crashed” faster than low-­ anxious ones and named the word “swerved” slower. Thus, high-anxious individuals were more likely to draw the threatening inference and less likely to draw the non-threatening one. This group difference disappeared when the time interval was less than 1,500 ms, suggesting the bias in predictive inferencing shown by high-anxious individuals did not depend on rapid “automatic” processes. Moser et al. (2012) obtained similar findings among individuals meeting criteria for social anxiety disorder (involving extreme fear and avoidance of social situations). They heard ambiguous sentence stems resolved by a negative or positive final word. Event-related potentials indicated that socially anxious listeners expected (or predicted) the negative completion more than non-anxious listeners. Do anxious individuals draw negative inferences from all ambiguous situations? Walsh et  al. (2015; see Chapter 15) addressed that issue using four kinds of ambiguous text scenarios: (1) social (potential threat of social embarrassment); (2) intellectual (potential threat of appearing unintelligent); (3) health (potential threat of severe illness; and (4) physical (potential threat of physical danger). High-anxious individuals drew more negative inferences than low-anxious ones only with the social and intellectual scenarios. This pattern of inference drawing indicates that high-anxious individuals are especially sensitive to situations involving social evaluation.

Findings: underlying processes The minimalist hypothesis and van den Broek and Helder’s (2017) theoretical framework are consistent with the assumption that predictive inferences can be drawn automatically. This issue was addressed by Gras et al. (2012) using short texts such as the following: Charlotte was having her breakfast on the terrace when the bees started flying about the pot of jam. She made a movement to brush them away but one of them succeeded in landing on her arm. The predictive inference is that Charlotte felt a sting. Gras et  al. (2012) followed the text with the word sting presented in blue, red or green 350, 750 or 1,000 milliseconds after the text with instructions to name the colour. The speed of colour naming was slowed only at

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 495

28/02/20 4:11 PM

496 Language Figure 10.11 Reaction times to name colours when the word presented in colour was predictable from the preceding text compared to a control condition (scores below 0 ms indicate a slowing effect of predictive inferences). Performance in the explicit condition is not relevant here. From Gras et al. (2012). © American Psychological Association.

1,000 milliseconds (see Figure 10.11). This finding suggests it took approximately 1 second for the predictive inference to be drawn. The fact that participants could not prevent it from interfering with colour naming suggests it was drawn automatically. Kuperberg et  al. (2011) also investigated the “automaticity” of inference drawing using short scenarios such as one discussed earlier: Jill had very fair skin. She forgot to put sunscreen on. She had sunburn on Monday. Kuperberg et  al. recorded event-related potentials to assess readers’ processing of these scenarios. Of particular interest was the N400 component which is larger when the meaning of the word currently being processed does not match its context. What did Kuperberg et  al. (2011) find? Consider the above scenario where the word sunburn in the third sentence is highly causally related to its context. There was only a small N400 to this word. Thus, processing of the causal inference explaining Jill’s sunburn in terms of her fair skin and failure to use sunscreen started very rapidly and probably fairly “automatically”. Kuperberg et  al. (2011) also focused on complex causal inferences using short scenarios such as the following: Jill had very fair skin. She usually remembered to wear sunscreen. She had sunburn on Monday. There was a small N400 to the word sunburn, but it was not as small as in the previous case. Thus, some inference processing is initiated very rapidly (and probably “automatically’) even with complex causal inferences.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 496

28/02/20 4:11 PM



Language comprehension

497

In spite of the above findings, there are circumstances where few inferences are drawn via “automatic” or passive processes. For example, Collins and Daniel (2018) studied trained speed readers whose reading rate was 35% faster than that of untrained readers. These speed readers did not appear to draw bridging or predictive inferences even when such inferences were strongly implied by the text. Finally, listeners’ stereotypical inferences about the speaker can influence sentence processing. For example, there was a large N400 when the sentence “I have a large tattoo on my back” was spoken in an upper-class accent (Van den Brink et  al., 2012; discussed earlier, p. 477). Thus, listeners rapidly draw inferences about the kinds of statement a given speaker is likely (or unlikely) to make.

Overall evaluation There is an increasing consensus on several issues: (1) Readers (and listeners) typically form bridging inferences (including causal inferences) to make coherent sense of text or speech. (2) Readers and listeners rapidly use contextual information and their world knowledge to draw inferences. (3) Many inferences (including causal and predictive ones) are often drawn relatively “automatically”. However, the extent to which this happens depends on various factors (e.g., working memory capacity; engagement with the text; reading speed). (4) Readers’ goals influence whether predictive inferences are drawn. (5) Readers with superior reading skills (including those having high working memory capacity) draw more inferences than other readers. (6) The major theories contribute to our understanding of inference drawing:    The minimalist hypothesis is probably correct when the reader is very quickly reading the text, when the text lacks global coherence, and when the reader has very little background knowledge. The constructionist theory is on the mark when the reader is attempting to comprehend the text for enjoyment or mastery at a more leisurely pace. (Graesser et al., 1997, p. 183)

Thus, inference drawing is very flexible. This flexibility is captured by van den Broek and Helder’s (2017) theoretical framework allowing for both passive and reader-initiated processes.

What are the limitations of theory and research in this area? First, it is often hard to predict which inferences will be drawn because inference drawing depends on several interacting factors (e.g., readers’ goals and reading ability). Second, it is also hard to predict which inferences will be drawn because of theoretical imprecision. For example, it is assumed within the minimalist hypothesis that automatic inferences are drawn if the necessary information is “readily available”. How do we establish the precise degree of availability of some piece of information? Third, the notion that

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 497

28/02/20 4:11 PM

498 Language

inference drawing depends on two processes (passive and reader-initiated: van den Broek & Helder, 2017) is oversimplified. Fourth, we need more research on individual differences in which inferences are drawn with ambiguous material.

DISCOURSE COMPREHENSION: THEORETICAL APPROACHES If someone asks us to describe a story or book we have read recently, we discuss the main events and themes omitting the minor details. Thus, our description is highly selective based on the meaning extracted from the story while reading it and on selective processes operating at retrieval. Imagine our questioner’s reaction if our description was not selective but simply involved recalling random sentences from the story! Gomulicki (1956) demonstrated the selectivity of story comprehension and memory. Some participants wrote a précis (summary) of a story visible in front of them whereas others recalled the story from memory. Still other participants, who were provided only with each précis and recall, had great difficulty in telling them apart. Thus, story memory resembles a précis in focusing primarily on important information. Several factors determine the importance of story information. For example, statements causally connected to several other statements are judged as more important than those lacking such causal connections (Trabasso & Sperry, 1985). Other factors are discussed later. Nearly all comprehension research has presented readers with paperbased texts. In the real world, however, readers increase use e-readers or computers (e.g., when accessing information from the internet). Margolin et al. (2013) found comprehension was comparable for paper, e-reader, and computer presentation for both narrative (telling a story) and expository texts (conveying facts and information). However, comprehension and learning are often reduced when texts are presented on a computer screen rather than on paper (Sidi et al., 2016). Why is this? First, readers engage in more multi-tasking and discontinuous reading on screen. Second, screen readers tend to be more confident than paper readers about their levels of comprehension and learning. Third, in spite of this overconfidence, screen readers perform comparably to paper readers in conditions emphasising the importance of deep processing (e.g., high perceived task importance) (Sidi et al., 2017). Below we discuss several theories or models of discourse comprehension starting with Bartlett’s (1932) influential schema-based approach. Numerous theories have been put proposed over the past 35 years or so (see McNamara and Magliano, 2009, for a review) and a few of the most prominent ones will be considered.

Schema theory: Bartlett Our processing of texts involves relating textual information to relevant structured knowledge stored in long-term memory. What we process in texts, how we process textual information, and what we remember about texts we have read all depend heavily on such previously stored information.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 498

28/02/20 4:11 PM



499

Language comprehension

Much stored knowledge consists of schemas (well-integrated packets of knowledge about the world, events, people and actions; see Chapter 7). Schemas include scripts and frames. Scripts (see Glossary) deal with knowledge about events and consequences of events whereas frames are knowledge structures referring to some aspect of the world (e.g., buildings). Ghosh and Gilboa (2014) argued schemas possess four necessary and sufficient features: (1) associative structure: schemas consist of interconnected units; (2) basis in multiple episodes: schemas consist of integrated information based on several similar events; (3) lack of unit detail: this follows from the variability of events from which any given schema is formed; (4) adaptability: schemas change and adapt as they are updated in the light of new information.

KEY TERM Rationalisation In Bartlett’s theory, errors in story recall that conform to the rememberer’s cultural expectations.

Case study: Bartlett

Several definitions of “schema” have been proposed. For example, Bartlett (1932) attached great importance to adaptability. Of interest, this is prob­ ably the feature least often found in recent definitions. Why are schemas important? First, they contain relevant information needed to understand what we hear and read. Second, schemas allow us to form expectations (e.g., of the sequence of events in a restaurant) that are generally confirmed, which makes the world relatively predictable. Third, schemas contain higher-level information (based on commonalities across events) making it easier to disregard trivial details during comprehension. Bartlett (1932) claimed persuasively that schemas strongly influence how we remember texts. More specifically, comprehension of (and memory for) texts depends on top-down processes triggered by schemas. He tested this hypothesis by presenting people with stories from a different culture to produce a conflict between the story itself and their prior knowledge. He found that what was remembered might be inaccurate because it included schematic knowledge not included in the story. Bartlett identified three main error types: (1) rationalisation (making recall more consistent with the reader’s cultural expectations); (2) levelling (omitting unfamiliar details); (3) sharpening (elaborating on certain details). The above errors might result from processes occurring during comprehension or retrieval. Bartlett (1932) favoured the latter explanation but others (e.g., Bransford & Johnson, 1972) emphasise comprehension processes. Note that Henderson (1903) anticipated many of Bartlett’s theoretical ideas (Davis, 2018).

Findings Bartlett (1932) used stories (e.g., “The War of the Ghosts”) from the North American Indian culture. Unfortunately, his studies were poorly controlled (Roediger, 2010). For example, he did not provide specific instructions:

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 499

28/02/20 4:11 PM

500 Language

“I thought it best . . . to try to influence the subjects’ procedure as little as possible” (Bartlett, 1932, p. 78). As a result, many distortions observed by Bartlett were due to conscious guessing rather than deficient memory. Gauld and Stephenson (1967) found instructions stressing the need for accurate recall (designed to reduce deliberate guessing) eliminated almost half the errors obtained using Bartlett’s original instructions. Bartlett (1932) claimed discourse or text information shows more rapid forgetting than schematic knowledge. Thus, the tendency for schematic knowledge to produce memory distortions should increase over time. Sulin and Dooling (1974) obtained support for this prediction. Participants received a story about a ruthless dictator identified as Gerald Martin or Adolf Hitler. It was assumed those told it concerned Hitler would activate their schematic knowledge of him. There was a recognition memory test at a short- or long-retention interval including the sentence “He hated the Jews particularly and so persecuted them”. As predicted, participants told the story was about Hitler were much more likely to falsely recognise the above Hitler-relevant sentence at the long- rather than the short retention interval. Strong evidence that memory distortions increase over time was reported by Bergman and Roediger (1999). Their participants read “The War of the Ghosts” and then recalled it three times. The proportion of recall involving major distortions increased from 27% at the shortest retention interval (15 minutes) to 59% at the longest (6 months). Bartlett (1932) argued that schemas influence retrieval as well as comprehension. Supporting evidence was reported by Anderson and Pichert (1978). Participants read a story from the perspective of a burglar or a potential homebuyer. After story recall, they recalled the story again from the alternative perspective (or schema). This time, participants recalled more information important only to the second perspective than on the first recall. Anderson et  al. (1983) found that manipulating the reader’s perspective while reading selectively enhanced encoding and comprehension of schema-relevant story information. Bransford and Johnson (1972) also found schemas influence story comprehension. Here is part of the story they used: The procedure is quite simple. First, you arrange items into different groups. Of course one pile may be sufficient depending on how much there is to do. If you have to go somewhere else due to lack of facilities that is the next step; otherwise, you are pretty well set. It is important not to overdo things. That is, it is better to do too few things at once than too many. What on earth was that all about? Listeners hearing the passage in the absence of a title rated it as incomprehensible and recalled only 2.8 idea units on average. However, listeners supplied beforehand with the title “Washing clothes” found it easy to understand and recalled 5.8 idea units on average. Having relevant schema information (i.e., the title) helped passage comprehension rather than simply acting as a retrieval cue – p ­ articipants receiving the title after hearing the passage but before recall recalled only 2.6 idea units on average.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 500

28/02/20 4:11 PM



501

Language comprehension

Research in cognitive neuroscience has established that the ventromedial prefrontal cortex plays a key role in schema processing (Gilboa & Marlatte, 2017; see Chapter 7). Van Kesteren et  al. (2010) studied the involvement of the ventromedial prefrontal cortex during comprehension. Viewers watched a film with the first half providing a schema for the second half. Activation in the ventromedial prefrontal cortex during viewing of the second half depended on whether the first half was presented in the typical sequential order (providing a strong schema) or out of order (providing a weak schema). Activation was greater when there was a weak schema because it was harder to integrate new information with schematic information.

KEY TERM Proposition A statement making an assertion or denial which can be true or false.

Evaluation Schematic knowledge assists text comprehension and memory. In addition, many distortions in memory for stories and other texts reflect the influence of schematic information. More generally, schema theory emphasises the role of top-down processes in discourse comprehension and memory (Wagoner, 2013). What are the limitations of schema theories? First, “schema” has many definitions (Ghosh & Gilboa, 2014) and it is hard to ascertain the precise information contained within any given schema. Second, schema-based explanations require independent evidence of the existence of relevant schemas, but this is usually lacking. As Harley (2013) pointed out, “The primary accusation against schema and script-based approaches is that they are nothing more than re-descriptions of the data.” Third, it is unclear when a given schema will be activated. Theoretically, schemas facilitate inference drawing during text comprehension, but many inferences are not drawn. In contrast, the phrase “the five-hour journey from London to New York” activates the “plane flight schema” even though no words in the phrase have strong associations with flying by plane (Harley, 2013). Fourth, schema theories exaggerate how error prone we are in everyday life. For example, Wynn and Logie (1998) found students recalled “real-life” events experienced during their first week at university reason­ ably accurately up to six months later. Fifth, Bartlett (1932) argued that schemas exert their influence at retrieval rather than during comprehension. In fact, schemas often also influence comprehension processes.

Kintsch’s construction-integration model Kintsch’s views on language comprehension have been very influential. In his well-known construction-integration model, Kintsch (1988, 1998) combined elements of schema-based theories and Johnson-Laird’s mental model approach (see Chapter 14). Here are the model’s main assumptions (see Figure 10.12): (1) Readers turn text sentences in the text into propositions (true or false statements) representing their meaning.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 501

28/02/20 4:11 PM

502 Language

(2) The propositions constructed from the text are stored briefly along with associatively related propositions (e.g., inferences). At this stage, many irrelevant propositions are stored. (3) Spreading activation (see Glossary) selects propositions for the text representation. In this integration process, clusters of highly interconnected propositions attract most activation and have the greatest probability of inclusion in the text representation. Within the text representation, it is hard to distinguish between propositions based directly on the text and those based on inferences. (4) As a result of the above processes, three levels of text representation can be constructed: (i)     surface representation (the text itself); (ii)     propositional representation or textbase (propositions formed from the text); (iii)  situation representation (a mental model describing the situation referred to in the text) – this is the only representation depending mostly on the integration process. The construction-integration model sounds rather (very?) complex. However, its major assumptions are straightforward. The initial construction of many propositions involves relatively inefficient processes with many irrelevant propositions being included. At this stage, context provided by the overall theme of the text is ignored. After that, the integration process uses contextual information from the text to weed out irrelevant propositions. What is the relationship between schemas (as proposed by Bartlett, 1932) and situation models? Schemas are abstract and very general whereas situation models are more specific. However, schemas are often used as the building blocks from which situation models are formed.

Figure 10.12 The construction-integration model. Adapted from Kintsch (1992).

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 502

28/02/20 4:11 PM



Language comprehension

503

How does the construction-integration model differ from schema theory? Schema theory emphasises top-down processes in discourse comprehension and memory. This differs substantially from the construction-­ integration model: “During the construction phase, the text input launches a dumb bottom-up process in the reader’s knowledge base . . . top-down factors, such as reading perspective or reading goal, exert their influence at the integration phase” (Kaakinen & Hyönä, 2007, p. 1323).

Findings Kintsch et  al. (1990) tested the assumption that text processing produces three levels of representation. Participants read brief descriptions of various situations and their recognition memory was tested immediately or at times ranging up to four days later. As predicted, forgetting was fastest for the least complete representation (i.e., the surface representation) and there was no forgetting for the most complete representation (i.e., the situation model). More evidence for the existence of three levels of representation was reported by Karlsson et  al. (2018). They asked children aged 9 to 11 to think aloud while reading texts. Some children (literal readers) stayed close to the text and produced a surface level understanding. Other children (paraphrasing readers) focused on the meaning of the text and produced a textbase understanding. Finally, some children (elaborating readers) made use of background knowledge and produced a situation model of the text. As predicted, comprehension ability was greatest in elaborating readers and least in literal readers. Nguyen and McDaniel (2016) increased the extent to which some readers formed a situation model from a text by given them instructions to reduce gaps in their situation model. Readers given those instructions had higher comprehension levels than those not given them. Another prediction is that readers should often find it hard to discriminate between text information and inferences drawn from the text. As we saw earlier, that prediction has received much support.

Figure 10.13 Forgetting functions for situation, proposition and surface information over a 4-day period. Adapted from Kintsch et al. (1990).

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 503

28/02/20 4:11 PM

504 Language

Kaakinen and Hyönä (2007) disputed the model’s assumption that the reader’s goal in reading influences the integration stage rather than the construction stage. In their study, participants read a text discussing four rare diseases. They were asked to assume a close friend had been diagnosed with one of them and they had to inform the friends they had in common about that disease. These instructions influenced the construction stage of comprehension – readers focused primarily on sentences relating to their friend’s disease. According to the model, text information is linked with general world or semantic knowledge before contextual information from the rest of the text. Cook and Myers (2004) tested this assumption using various passages. Here is an excerpt from one passage: The movie was a small independent film with a low budget and small staff, so everyone involved had to take on extra jobs and responsibilities. On the first day of filming “Action!” was called by the actress so that shooting could begin . . . The model predicts that readers’ knowledge that actresses do not direct films should have caused them to fixate the word actress for a long time. In fact, however, that word was not fixated for long because readers immediately used the contextual justification for someone other than the director being in charge (in italics). Thus, in opposition to the model, contextual information can be accessed before general world knowledge. The precise processes involved in integration and leading to a situation model are not spelled out within the model. However, it is probable that various executive functions (see Glossary) are involved such as inhibitory processes (suppressing irrelevant propositions), attention shifting (cognitive flexibility), updating information in working memory, and planning. Follmer (2018) reported in a meta-analysis that individuals high in each of these executive functions had superior comprehension ability to those low in these functions.

Evaluation The key notion that propositions for the text representation are selected by spreading activation operating on propositions drawn from the text and stored knowledge is plausible and consistent with most of the evidence. There is also reasonable evidence for the model’s three levels of ­representation. The model predicts accurately that readers often find it hard to discriminate between text information and related inferences. The model has influenced the development of several subsequent theories (especially the RI-Val model below). What are the model’s limitations? (1) The model is less applicable when texts are easy to process (McNamara & Magliano, 2009). With easy texts, there is often no need to generate a situation model. (2) The assumption that only bottom-up processes are used during the construction phase of text processing is dubious. The finding that

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 504

28/02/20 4:11 PM



Language comprehension

505

readers’ goals can lead them to allocate attention selectively very early in text processing (Kaakinen & Hyönä, 2007) suggests text processing is more flexible than assumed theoretically. (3) It is assumed only general world and semantic knowledge is used in addition to text information during the construction phase. In fact, other sources of information (e.g., context) can also be used during this phase (e.g., Cook & Myers, 2004). (4) The model is oversimplified. For example, O’Brien and Cook (2016) argued persuasively that language comprehension involves a validation stage continuing after the completion of the integration stage (see below). (5) The cognitive processes involved in the integration stage of text comprehension are not specified clearly in the model. As we have seen, inhibitory processes, attention shifting, updating and planning are all involved (Follmer, 2018). (6) The model accounts for the relatively “automatic” inferences drawn during reading but not more effortful ones (Reichle, 2015). (7) Individual differences (e.g., in working memory capacity) are de-emphasised. (8) There is an exaggerated emphasis on the role played by abstract propositions in forming situation models. More recent theories (e.g., the event-indexing model discussed shortly) assume situation models include more concrete information (e.g., perceptual details).

RI-Val model O’Brien and Cook (2016) developed Kintsch’s construction-­ integration model. Their model assumes there are three stages in language comprehension: (1) The activation or resonance (R) stage: there is a “dumb and unrestricted process” (p. 329) in which any discourse-relevant information in long-term memory can be activated and influence initial comprehension. (2) The integration (I) stage: activated concepts are linked to (or integrated with) the contents of working memory. Integration is based on conceptual overlap, making it possible that it results in “the connection of related, but contradictory pieces of information” (Williams et al., 2018, p. 1415). (3) The validation (Val) stage: linkages formed during the integration stage are validated against relevant information (e.g., general knowledge) stored in long-term memory. O’Brien and Cook’s (2016) RI-Val model is shown in Figure 10.14. The three processing stages overlap in time but start in the order described above. It is assumed all three processes are passive (i.e., relatively automatic) and always continue to completion. At some point in processing, the reader (or listener) decides on the basis of the validation process that they have an adequate comprehension of the discourse: the coherence threshold has been reached.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 505

28/02/20 4:11 PM

506 Language

From O’Brien and Cook (2016).

Coherence threshold Degree of influence on comprehension

Figure 10.14 This is the RI-Val model showing the effects on comprehension of resonance, integration and validation over time. Note that these processes continue even after the coherence threshold has been reached.

Resonance Integration Validation

Threshold Time

How does the RI-Val model compare with Kintsch’s construction-­ integration model? The two models are broadly similar: the resonance and integration states resemble the construction stage in Kintsch’s model. However, there are two important differences: (1) The RI-Val model explicitly identifies separate integration and validation processes whereas validation is only implicitly incorporated within the construction-integration model. (2) In contrast to other models, it is assumed within the RI-Val model that the validation process often continues even after the readers (or listeners) have understood the discourse (see Figure 10.14). As a consequence, they may detect an inconsistency in what they are reading or hearing after reaching the coherence threshold. Cook and O’Brien (2014) obtained support for the above prediction. Participants read a passage about Mary who had been a strict vegetarian for many years. The crucial (target) sentence in the passage was either, Mary decided to order a cheeseburger, or, Mary decided to order a tuna salad. The pattern of eye movements indicated that participants rapidly detected the inconsistency between Mary being a vegetarian and ordering a cheeseburger. However, the inconsistency is less obvious when Mary orders a tuna salad. With that inconsistency, it was only on the sentence following the target one that the participants’ eye movements were disrupted. Williams et al. (2018) obtained similar findings. Readers often detected that sentences such as, Moses brought two animals of each kind on the ark, were incorrect when reading the sentence following the incorrect one. In sum, it makes sense to divide Kintsch’s integration stage into somewhat separate integration and validation stages. The prediction that detection of an inconsistency in discourse can be delayed when it is not immediately obvious based on participants’ general knowledge has been reported (Cook & O’Brien, 2014; Williams et al., 2018). What are the model’s limitations? First, it does not explicitly consider the effects of individual differences (e.g., in working memory capacity) on

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 506

28/02/20 4:11 PM



Language comprehension

507

the three major processes involved in language comprehension and on setting the coherence threshold. Second, the model focuses too much on passive or automatic processes and has very little to say about active strategic processes (discussed later).

Event-indexing model and event-segmentation theory Kintsch’s construction-integration model is a leading example of a situation model. As Zwaan (2016, p. 1028) pointed out, “The basic idea behind situation models is that comprehension of a stretch of discourse involves the construction of the state of affairs denoted by the text rather than only a mental representation of the text itself.” In this section, we consider two more situation models representing developments of Kintsch’s approach. The theoretical discussed here both emphasise the importance of events. According to Radvansky and Zacks (2011, p. 608), “Events are fundamental to human experience. They [are] the elements that constitute the stream of experience.” One theoretical approach we will discuss is the event-indexing model (Zwaan et  al., 1995). The other is event-segmentation theory (Zacks et  al., 2007) which represents a development and extension of the event-­indexing model. They are both situation models, but they differ from Kintsch’s model in the nature of the representations formed during discourse comprehension. He argued that mental models consist of abstract propositions. In contrast, the event-indexing and event-segmentation theories assume representations are often grounded in perception and action (Zwaan, 2014). Zwaan (2016, p. 1029) used the sentence “The egg is in the carton” to illustrate the difference between these theoretical approaches. A concrete representation of the sentence might include the shape of a whole egg whereas an abstract representation would not. The event-indexing model (Zwaan et al., 1995) focuses on comprehension processes when someone reads a narrative text (e.g., a story or novel). Thus, its scope differs from that of the construction-integration model where the emphasis is on comprehension of expository texts designed to describe and/or inform. However, there are some similarities (e.g., the emphasis on constructing situation models during reading). As McNamara and Magliano (2009, p. 321) pointed out, a fundamental assumption of the event-indexing model is that “The cognitive system is more attuned to perceive dynamic events (changes in states) rather than static information”. According to the event-indexing model, readers monitor five situational aspects to decide whether their situation model requires updating:

Interactive exercise: Construction-integration model

(1) protagonist: the central character or actor in the present event compared to the previous one; (2) temporality: the relationship between the times at which the present and previous events occurred; (3) causality: the causal relationship of the current event to the previous one; (4) spatiality: the relationship between the spatial setting of the current and previous events;

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 507

28/02/20 4:11 PM

508 Language

(5) intentionality: the relationship between the character’s goals and the present event. What happens to outdated information when we update a situation model? There are two possibilities (Zwaan & Madden, 2004). First, such information continues to influence the comprehension process (resonance view). Second, outdated information is mostly or totally discarded in favour of new information in the text (here-and-now view). According to event-segmentation theory (Zacks et al., 2007), updating of a situation model can take two main forms: (1) incremental updating of individual situational dimensions (the “brickby-brick” approach emphasised with the event-indexing model); (2) global updating in which the current situational model is replaced by a new one (the “from scratch” approach emphasised by event-­ segmentation theory); such updating is most likely to occur when we reach the boundary between one event and the next. When do readers engage in global updating? It is assumed we try to predict the near future when reading a text or observing a scene. Such predictions become harder to make as we approach the boundary between one event and the next, which can trigger construction of a new model.

Findings According to the event-indexing model, updating is effortful and incurs a processing load. As predicted, Swets and Kurby (2016) found reading time (indexed by eye movements) was greater when updating was required at the boundary between the end of one event and the start of the next. If each aspect of the situation model is processed independently or separately, we would predict updating time to be greater when two aspects require updating rather than only one. Curiel and Radvansky (2014) obtained the predicted finding. The probability of updating occurring varies across the situational aspects. Readers generally update information on intentionality, time and protagonist, but are less likely to do so with spatial information (Smith & O’Brien, 2012). Most research has involved only short texts. McNerney et  al. (2011) studied reading times while participants read a novel. In contrast to findings with short texts, reading times were reduced when spatial or tem­poral information required updating. Perhaps readers’ engagement with the novel facilitated the task of updating such information. As discussed earlier, the theoretical approach discussed here assumes situation models often contain concrete perceptual and/or motor information. As Kaup et  al. (2007, p. 978) argued, “Comprehension is tied to the creation of representations . . . similar in nature to the representations created when directly experiencing or re-experiencing the respective situations and events.” Evidence supporting the above theoretical assumption has been obtained using the sentence-picture verification task. Zwaan et  al. (2002)

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 508

28/02/20 4:11 PM



Language comprehension

509

asked participants to read sentences such as the following: The ranger saw an eagle in the sky or, The ranger saw an eagle in the nest. They were then presented with a picture and decided rapidly whether the object in the picture had been mentioned in the preceding sentence. Verification times were faster when the object’s shape in the picture (e.g., an eagle with outstretched or folded wings) matched the shape implied in the sentence indicating readers’ use of perceptual information. Zwaan and Pecher (2012) also used the sentence–picture verification task in several experiments. Verification times were faster to pictures matching the implied orientation, shape or colour of sentence objects than to non-matching pictures. Moore and Schwitgebel (2018) reported evidence consistent with the above assumptions and research. Participants read various kinds of text (e.g., stage dialogue; poems; descriptive text). When they heard a beep, they indicated whether they had just been engaged in visual imagery. Across the different kinds of text, readers reported visual imagery 70% of the time. Is it cognitively demanding for readers to create detailed visual simulations of objects referred to in texts? Gao and Jiang (2018) increased processing demands by presenting text in a hard-to-read font. However, this did not impair readers’ ability to infer the physical shapes of objects referred to in texts. The implication is that relatively little processing ­capacity is required for readers to create visual simulations while comprehending texts. Does outdated information disrupt current text processing and formation of a situation model? Kendeou et al. (2013) argued it would be disadvantageous if outdated information was always disrupting. They discovered the provision of causal explanations supporting the updating process eliminated disruption. One story involved Mary, who had been a vegetarian for 10 years. This sentence was presented late in the story: Mary ordered a cheeseburger and fries. There was no disruption from outdated information for readers giving a causal explanation why Mary was no longer vegetarian (she had insufficient vitamins and so her doctor told her to eat meat). Is situation-model updating incremental (as claimed within the event-­ indexing model) or global (as claimed within event-segmentation theory)? Kurby and Zacks (2012) asked readers to think aloud while reading an extended narrative. They showed incremental updating by increased mentions of the character, object, space, time and goal when the relevant situational aspect changed. They also showed global updating – the presence of an event boundary was associated with increased mentions of the character, cause, goal, and time. Huff et al. (2018) found that updating was also incremental when listeners were presented with an audio drama.

Evaluation The greatest strength of the event-indexing model and event-­segmentation theory is that they identify key aspects of situation models that were de-emphasised within other theoretical approaches. For example, they focus on gradual and global updating of situation models in response to changes within and between events. The assumption that situation models are not limited to abstract propositions (as assumed within the

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 509

28/02/20 4:11 PM

510 Language

construction-integration model) but can include perceptual and other concrete types of information has received much empirical support. What are the limitations of this theoretical approach? First, it is fully applicable only to narrative texts describing event sequences and is of little relevance to expository texts providing information and/or explanations. Even with narrative texts, situation models are less likely to be formed when the text is complicated. For example, most readers failed to form a situation model when reading a complex account of a murder scene (Zwaan & van Oostendorp, 1993). Second, the theoretical approach de-emphasises important determinants of comprehension (e.g., the reader’s goals and reading skills; McNamara & Magliano, 2009). Reasonable reading skills and adequate motivation for successful monitoring the five different dimensions of protagonist, temporality, causality, spatial relationships and intentionality. Third, relatively little is known about the underlying mechanisms leading text comprehension to produce representations containing perceptual and/or motor information (Dijkstra & Post, 2015). We also lack a detailed understanding of how such concrete information and the abstract propositional information emphasised by Kintsch are combined during text comprehension. Fourth, most research has involved relatively short texts. However, preliminary evidence suggests some comprehension processes may differ between long and short texts (McNerney et al., 2011).

CHAPTER SUMMARY •

Parsing: overview. Listeners often make use of prosodic cues (e.g. pauses) provided by the speaker to interpret sentences (especially ambiguous ones). There is much evidence readers use their “inner voice” to produce implicit prosody that closely resembles the prosody found in spoken sentences. Readers’ use of implicit prosody is greatly aided by the presence of commas in text.

Theoretical approaches: parsing and prediction. The gardenpath model is a two-stage model where only syntactic information is used at the first stage. In fact, various kinds of non-syntactic information are sometimes used earlier in sentence processing than the model predicts. The model erroneously predicts that most readers will ultimately generate a correct syntactic structure even for complex sentences. According to the constraint-based model, all sources of information (e.g., context) are available immediately to someone processing a sentence. Competing sentence analyses are activated in parallel, with several language characteristics (e.g., verb bias) being used to resolve ambiguities. There is much support for this model. However, its predictions are sometimes imprecise. According to the unrestricted race model, all information sources are used to identify a single syntactic structure for a •

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 510

28/02/20 4:11 PM



Language comprehension

511

sentence. If this structure is disconfirmed, there is extensive re-analysis. This model de-emphasises the importance of task demands in influencing parsing. According to the good-enough language processing account, we often process sentences rather superficially using various heuristics and so are prone to error. It is not clear how readers decide a proposed sentence structure is good enough. ERP studies indicate several sources of information (including word meanings and context) influence sentence processing at an early stage. Top-down processes generate predictions as to what will be read next. Pragmatics. Pragmatics is concerned with intended rather than literal meanings. Understanding figurative language (e.g., metaphor; irony) is often relatively complex because it involves simultaneous processing of metaphorical and literal meanings. Understanding metaphors involves selecting predicate features relevant to the argument and inhibiting irrelevant predicate features. There are processing differences between different types of metaphors and between novel and familiar metaphors. Listeners generally understand better what speakers are saying if they make use of the common ground (shared knowledge and beliefs). When it is effortful to use the common ground, listeners often rely on the egocentric heuristic. Sometimes listeners make simultaneous use of an egocentric perspective and common ground. •



Individual differences: working memory capacity. Individuals high in working memory capacity outperform low-capacity individuals with respect to language comprehension. This superiority depends on both specific individual differences (e.g., verbal working memory) and more general ones. It has generally been assumed theoretically that there is a direct relationship between working memory capacity and reading comprehension. However, there is increasing evidence that the relationship is indirect and depends on factors such as vocabulary size, language experience and general reasoning ability.



Discourse processing: inferences. Readers typically make logical and bridging inferences (e.g., anaphor resolution) but the extent to which they make elaborative inferences is variable. Inference drawing depends on two types of processes: (1) passive or “automatic” processes; (2) effortful reader-initiated processes. Reader-initiated processes are most likely to be used when required for readers to attain adequate comprehension and coherence while reading a text. Anaphor resolution is a very common form of bridging inference. It involves multiple constraints operating interactively in parallel. Most research focuses on

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 511

28/02/20 4:11 PM

512 Language

whether inferences are drawn rather than which inferences are drawn. However, there is evidence that anxious individuals are more likely than non-anxious ones to draw threatening predictive inferences. Discourse comprehension: theoretical approaches. According to schema theory, schemas or organised packets of knowledge influence our comprehension of (and memory for) discourse in a top-down fashion. The theory lacks explanatory power, and comprehension and memory are less error-prone than assumed theoretically. According to Kintsch’s construction-integration model, three levels of text representation are constructed. It is assumed bottom-up processes (construction stage) are followed by top-down processes (integration stage). However, top-down processes occur earlier than assumed theoretically. The model is less applicable when texts are easily processed. According to the RI-Val model (a development of the construction-integration model), a long-lasting validation process can detect inconsistencies in a text even after readers believe they have adequate text comprehension. The event-indexing model and event-segmentation theory focus on how readers update their situation models in response to changes within and between events. This general approach works well with simple narrative texts but is less applicable to complex and/or expository texts. The approach de-emphasises the role played by the reader’s goals and reading skills. •

FURTHER READING Carreiras, M., Armstrong, B.C. & Duñabeitia, J.A. (2018). Reading. In S.L. Thompson-Schill (ed.), Stevens’ Handbook of Experimental Psychology and Cognitive Neuroscience, Vol. 3: Language and Thought (4th edn; pp. 207–244). New York: Wiley. Manuel Carreiras and colleagues provide a thorough account of comprehension processes in reading. Cook, A.E. & O’Brien, E.J. (2017). Fundamentals of inferencing during reading. Language and Linguistics Compass, 11 (Article e12246). This article evaluates several major theoretical approaches to inference drawing while comprehending text. Garnham, A. (2018). Pragmatics and inference. In S.-A. Rueschemeyer and M.G. Gaskell (eds), The Oxford Handbook of Psycholinguistics (2nd edn). Oxford: Oxford University Press. Alan Garnham provides a thorough review of theory and research on pragmatics. Karimi, H. & Ferreira, F. (2016). Good-enough linguistic representations and online cognitive equilibrium in language processing. Quarterly Journal of Experimental Psychology, 69, 1013–1040. Hossein Karimi and Fernanda Ferreira propose an interesting theoretical model of language comprehension based on the good-enough processing approach. Kim, A.E. (2018). Sentence processing. In S.L. Thompson-Schill (ed.), Stevens’ Handbook of Experimental Psychology and Cognitive Neuroscience, Vol. 3:

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 512

28/02/20 4:11 PM



Language comprehension

513

Language and Thought (4th edn; pp. 111–148). New York: Wiley. The many processes involved in understanding sentences are discussed in detail in this chapter. Peng, P., Barnes, M., Wang, C., Wang, W., Li, S., Swanson, H.L. et al. (2018). A meta-analysis on the relation between reading and working memory. Psychological Bulletin, 144, 48–76. This article provides a detailed account of how reading performance is related to individual differences in working memory. Pickering, M.J. & Gambi, C. (2018). Predicting while comprehending language. Psychological Bulletin, 144, 1002–1044. Martin Pickering and Chiara Gambi discuss how listeners’ and readers’ language comprehension is enhanced by predictive processes. Zwaan, R.A. (2016). Situation models, mental simulations, and abstract concepts in discourse comprehension. Psychonomic Bulletin and Review, 23, 1028–1034. Rolf Zwaan discusses key theoretical issues relating to approaches to language comprehension based on situation models (e.g., Kintsch’s constructionintegration model; event-indexing model; event-segmentation theory).

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 513

28/02/20 4:11 PM

Chapter

11

Language production INTRODUCTION We know much more about language comprehension than language production. Why is this? We can easily control material to be comprehended, but it is harder to constrain an individual’s language production. In addition, to account for language production, we need more than simply a theory of language. Language production is basically a goal-directed activity having communication as its main goal. People speak and write to impart information, to be friendly and so on. Thus, motivational and social factors must be considered in addition to purely linguistic ones. This chapter focuses on speech production and writing (including the effects of brain damage on these language processes). More is known about speech production than about writing and nearly everyone spends more time talking than writing. Thus, it is more important to understand the processes involved in talking. Nevertheless, writing is an important skill. How similar are the processes involved in spoken and written language? Both have as their central function the communication of information about the writer or speaker, other people and the world. In addition, both depend on the same knowledge base. However, children (and nearly all adults) find writing much harder than speaking which suggests there are important differences between them. The main similarities and differences are discussed below.

Similarities The view that speaking and writing are similar receives support from theoretical approaches to these language activities. For example, it is assumed both start with planning, in order to decide on the overall meaning to be communicated (e.g., Dell et  al., 1997, on speech production; Hayes, 2012, on writing). At this stage, the actual words to be spoken or written are not considered. The planning stage is followed by language production (often on a clause-by-clause basis). Miozzo et  al. (2018) identified several other similarities. First, children typically only learn how to write after they have developed good spoken-language skills. Second, the teaching of writing often focuses on

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 514

28/02/20 4:11 PM



Language production

515

knowledge of spoken language (e.g., emphasising speech–print correspondences). Third, adults’ word spellings are influenced to some extent by word pronunciation. Some forms of written communication closely resemble spoken forms. For example, consider instant messaging involving a rapid exchange of typed messages. Choe (2018) studied five Korean friends exchanging instant messages. The messages were mostly typed in a spontaneous and informal fashion resembling casual speech. Those receiving these messages often replied with simplified messages (e.g. yeah) or indicated their involvement by responding with “machine-gun” questions (produced very rapidly without hesitation). In sum, the exchanges of typed messages were very similar to spoken conversation.

Differences There are several important differences between speaking and writing. Written language typically uses longer and more complex constructions as well as longer words and a larger vocabulary. Writers make more use than speakers of words or phrases signalling what comes next (e.g., but; on the other hand). This helps to compensate for the fact that there is a relative lack of prosody (rhythm; intonation, and so on; discussed shortly) in writing compared to speech. Here are five major differences between speaking and writing (Crystal, 2005): (1) Speech is time-bound and transient whereas writing is space-bound and permanent. (2) Speakers typically have much less time available for planning than writers and so spoken sentences are typically shorter. (3) Speakers mostly receive immediate verbal and non-verbal feedback (e.g., expressions of bewilderment) from their listeners. (4) Speech is well suited to social functions (e.g., casual chatting), whereas writing is well suited to communicating facts and ideas. (5) Writers have direct access to what they have produced so far whereas speakers do not. What are the consequences of the above differences? Speech is often informal and simple in structure whereas writing is more formal and complex. Writers need to write clearly because they do not receive immediate feedback. Some brain-damaged patients have largely intact writing skills despite an almost total inability to speak and a lack of inner speech (e.g., EB, studied by Levine et  al., 1982). Other brain-damaged patients can speak fluently but find writing very hard (Ellis & Young, 1988). Rapp et al. (2015) studied patients following a left-hemisphere stroke. Aspects of grammar were impaired in writing but not speech for some patients whereas others showed the opposite pattern. These findings suggest partial independence of the processes underlying writing and speech. However, the higher-level processes of language production (e.g., planning; use of knowledge) are probably very similar in speech and writing.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 515

28/02/20 4:11 PM

516 Language

We must not exaggerate the differences between speech and writing. As Crystal (2005, p. 8) noted, “There are few . . . absolute differences between speech and writing, and there is no single parameter of linguistic variation which can distinguish all spoken from all written genres.” For example, emails often contain features associated with speech (e.g., informality; rapid feedback).

BASIC ASPECTS OF SPEECH PRODUCTION We start by introducing broad issues of direct relevance to speech production. First, there is the important (but complex) issue of the extent to which speech production utilises processes involved in speech comprehension. Second, we argue speech production is much harder than it may appear subjectively. Strategies used by speakers to cope with the complexities of  speech production are discussed. Third, we provide a preliminary account of the notion that speech production involves a series of processing stages.

Speech production vs speech comprehension We saw in Chapter 9 that speech perception (especially under difficult listening conditions) often involves brain regions associated with speech production (e.g., Adank, 2012). In similar fashion, it is increasingly argued that speech production often involves processes and brain regions associated with speech perception. For example, Chater et  al. (2016, p. 244) argued that “Language comprehension and production are facets of a unitary skill”. They discussed their computational model (the ChunkBased Learner) which simulates aspects of children’s language acquisition and is equally applicable to production and comprehension (see discussion later). Strong evidence that speech production involves processes overlapping with those used in speech perception was reported by Silbert et  al. (2014). They identified the brain areas activated as speakers produced a 15-minute narrative, and also those activated as listeners comprehended the  same ­narrative. They argued there are two possible reasons why a given brain area is activated during both speech comprehension and production: (1) The same processes are occurring in both cases. (2) Different processes occur in comprehension and production within the same brain area. In their analyses, they assumed (1) was the case only when there were similar patterns over time during comprehension and production: they called this comprehension-production coupling. Silbert et  al.’s (2014) findings are shown in Figure 11.1. Several brain areas exhibited comprehension-production coupling (in blue). Thus, speech production shares several processes with speech comprehension. Unsurprisingly, other brain areas were specifically associated with speech comprehension or speech production. In approximate terms, the findings

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 516

28/02/20 4:11 PM



517

Language production

Schematic summary SPM PM IFG

IPS

MC

TPJ

STG

rpPFC

precu neus PCC

AG

IPS AG

RS

PM

TPJ

MTG IT

IT

TP

Thals

Cau dala s mu Putarnen

IFG

STG

MTG TP

MC

Left hemisphere

Medial

Right hemisphere

Production

Overlap without coupling

Comprehension

Comprehension-production coupling (CPC)

overall suggested high-level language processing is more likely than lowlevel processing to be shared between comprehension and production. Pickering and Gambi (2018, p. 1002) focused on the use of speech-­ production processes in speech comprehension: “[Comprehenders] covertly imitate the linguistic form of the speaker’s utterance and construct a representation of the underlying communicative intention . . . [they] run this intention through their own production system to prepare the predicted utterance.” This hypothesis has much support (see Chapter 10). For example, the average gap between turns during a conversation between two individuals is only 250 ms (Stivers et  al., 2009) even though it takes about 600 ms to produce a single word (Pickering & Gambi, 2018).

Figure 11.1 Areas activated (and coupled) during speech comprehension and production are in blue (STG = superior temporal gyrus; MTG = medial temporal gyrus; AG = angular gyrus; RPJ = temporal-parietal junction; IFG = inferior frontal gyrus); areas activated (not coupled) during comprehension and production are in orange; areas activated only during production are in red; areas activated only during comprehension are in yellow. From Silbert et al. (2014).

How easy is speech production? On the face of it (by the sound of it?), speech production seems straightforward. Indeed, it seems almost effortless when we chat with friends and acquaintances. We typically speak at 2–3 words per second or about 150 words a minute and this rapid speech rate suggests that speaking requires relatively few processing resources. The reality of speech production is very different from what is implied above. Consider Christiansen and Chater’s (2016) theoretical approach (discussed in Chapter 10 and in more detail later, p. 535). According to this approach, our short-term memory has very limited capacity and new information rapidly eliminates old information. As a result, “Once detailed [language] production information has been assembled, it must be executed straight away, before it is obliterated by the on-coming stream of later low-level decisions” (Christiansen & Chater, 2016, p. 5). Evidence speech production is often more cognitively demanding than speech comprehension was reported by Boiteau et  al. (2014). Participants tracked a moving target while engaged in speech production or comprehension. Speech production (especially speech planning) was associated with a greater impairment of tracking performance than speech comprehension. These findings suggest speech production is more attentionally demanding than comprehension.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 517

28/02/20 4:11 PM

518 Language

KEY TERMS Syntactic priming The tendency for a speaker’s utterances to have the same syntactic structure as those they have heard shortly beforehand. Preformulation The production by speakers of phrases used frequently before; it reduces the demands of speech production Underspecification A strategy used to reduce processing costs in speech production by using simplified expressions.

How do speakers cope with the cognitive demands of producing speech? One way is by re-using aspects of what they have just heard. An important example is syntactic priming in which speakers re-use a given syntactic structure. If, for example, you have just heard a passive sentence (e.g., “The man was bitten by the dog”), this increases the probability you will produce a passive sentence. Convincing evidence for syntactic priming was reported in a meta-analysis (see Glossary) by Mahowald et al. (2016). This priming effect was especially strong when speakers used the same (or similar) words to those they had heard. Another way speakers reduce processing demands is via p ­ reformulation, which involves producing phrases used before. Approximately 70% of our speech consists of word combinations we use repeatedly (Altenberg, 1990;  Liu, 2014). Horseracing commentators (who speak very rapidly) make extensive use of preformulations (e.g., “They are off and racing now”). Another strategy used to facilitate speech production is ­ nderspecification, which involves using simplified expressions where the u full meaning is not explicit. Underspecification and preformulation often go together. “Or something” and “and things like that” are examples.

IN THE REAL WORLD: MILD COGNITIVE IMPAIRMENT It is cognitively demanding to produce coherent spontaneous speech. Unsurprisingly, patients with Alzheimer’s disease exhibit clear signs of impaired spontaneous speech. Alzheimer’s disease is often preceded by mild cognitive impairment, a condition involving minor problems with memory and thinking. Here we consider which aspects of speech production are impaired in individuals with mild cognitive impairment. Berisha et  al. (2015) compared the press conferences of two American presidents: Ronald Reagan and George Herbert Walker Bush. Reagan was diagnosed with Alzheimer’s disease six years after leaving office. He showed a substantial reduction in the use of unique words during his time as president, coupled with a large increase in conversational fillers (e.g., “well”; “um”; “ah”) and non-specific nouns (e.g., “something”; “anything”). Thus, his speech became simpler and less informative due to mild cognitive impairment. In contrast, President Bush showed no systematic changes in vocabulary use over time. Mueller et  al. (2018) studied individuals with early mild cognitive impairment (subtle cognitive deficits not meeting the criteria for mild cognitive impairment). Over a two-year period, these individuals showed greater decline than cognitively healthy controls in two aspects of speech: (1) fluency (e.g., few filled pauses; few false starts); (2) semantic content (proportion of words providing meaningful content). These findings resemble those found with President Reagan. In sum, reductions in speech-production quality are found even during the early stages of mild cognitive impairment, suggesting that “normal” quality of speech production requires intact cognitive processes. Such findings also suggest that speech-production deficits may serve as an “early warning” of future, more severe, cognitive impairment. Of relevance, Berisha et al. (2017) found professional American football players had lower spoken language complexity (e.g., low ratio of content words (e.g., nouns; verbs) to total words spoken) than controls. In addition, those football players who had been tackled the most had the lowest spoken language complexity. It might be a useful precaution to carry out more detailed cognitive testing on those American football players with the least spoken language complexity.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 518

28/02/20 4:11 PM



519

Language production

Stages in speech production

KEY TERMS

Speech production involves several general stages or processes. Dell (1986), in his spreading-activation theory (discussed later, pp. 526–530), argued speech production consists of four levels:

Alzheimer’s disease A disease in which general deterioration of the brain leads to progressive mental deterioration.

●●

●●

●● ●●

semantic level: the meaning of what is to be said (or the message to be communicated); this is the planning level; syntactic level: the grammatical structure of the words in the planned utterance; morphological level: the morphemes (basic units of meaning); phonological level: the phonemes (basic units of sound).

It makes sense to assume the above four levels or stages occur in the order described. Thus, we engage in planning, followed by working out the grammatical structure of the sentence and the basic units of meaning, and finally work out the sounds to be produced. In fact, speech production is much less neat and tidy: “later” processes can occur at the same time as (or even ahead of) “earlier” processes.

Morphemes The basic units of meaning; words consist of one or more morphemes. Clause A group of words within a sentence that contains a subject and a verb. Phrase A group of words within a sentence expressing a single idea.

SPEECH PLANNING We typically plan what we are going to say before speaking (the first stage in speech production). In other words, we engage our brain before speaking. Is speech planning influenced by the syntactic structure of planned utterances? Supporting evidence was reported by Lee et al. (2013). Consider the following sentence: The student of the teacher who is raising her hand. This sentence is ambiguous. Is the person raising their hand the teacher (simpler syntactic structure) or the student (more complex structure). The time to initiate speech was longer when speakers produced the more complex syntactic structure indicating that speech planning included aspects of syntactic structure. What is the scope of speakers’ planning? Planning might occur at the level of the clause (a part of a sentence containing a subject and a verb). Alternatively, it might occur at the level of the phrase (a group of words expressing a single idea). In the sentence “Failing the exam was a major disappointment to him”, the first three words form a phrase. Holmes (1988) found speakers talking spontaneously about various topics had hesitations and pauses immediately before the start of a clause. This suggests they were planning the forthcoming clause. Martin et  al. (2004) found speakers describing moving pictures took longer to initiate speech when the initial phrase was complex rather than simple. This suggests they planned the initial phrase before speaking. The extent of advance speech planning often differs at the semantic, syntactic and phonological levels. Garrett (1980) analysed various types of speech errors (discussed further later, pp. 521–523). Word-exchange errors (e.g., “My chair seems empty without my room”) often involved

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 519

28/02/20 4:11 PM

520 Language

words belonging to the same syntactic or grammatical category and generally spanned different phrases. These errors occur during grammatical encoding. In contrast, sound-exchange errors (e.g., burst of beaden instead of beast of burden) typically involved nearby elements within a phrase. These errors occur during phonological encoding. Garrett concluded that grammatical or syntactic planning occurs over greater distances within a sentence than does phonological planning. Recent research reveals a more complex picture. In a review, Klaus et  al. (2017) concluded planning at the syntactic and phonological levels can both extend “beyond the initial phrase and may even span over a whole simple sentence” (p. 813). In their own research, Klaus et al. found advance planning was comparable at the syntactic and phonological levels when speakers performed a visuo-spatial task at the same time. However, performing a verbal task (remembering digits) at the same time only reduced the extent of phonological planning, suggesting the verbal task and phonological planning required the same processing resources. More generally, the findings suggest that the scope of planning at any given level (e.g., syntactic; phonological) depends on the precise demands of any nontask processing occurring at the same time.

Flexibility How can we account for the variable findings discussed above? Speakers generally want to start communicating rapidly (implying little forward planning). However, they also want to talk fluently (implying much forward planning). Speakers resolve conflict flexibly depending on their immediate goals and the situational demands. Ferreira and Swets (2002) reported evidence that speakers’ planning varies flexibly. Speakers answered mathematical problems. When there was no time pressure, speakers planned their responses before starting to speak and so the time taken to start speaking was influenced by task difficulty. When there was time pressure, speakers engaged in limited planning before starting to speak, with additional planning occurring during speaking. Thus, they did as much forward planning as was feasible before speaking. Wagner et  al. (2010) agreed speech planning is flexible and identified several factors influencing advance planning. First, individuals speaking slowly engaged in more planning than fast speakers. Second, there was more planning before speakers produced simple rather than complex sentences. Third, speakers showed more planning when under low (rather than high) cognitive load. How can we explain speakers’ flexible advance planning? As mentioned earlier, they face a trade-off between communicating effectively and avoiding errors on the one hand and cognitive demands on the other hand. If they focus on avoiding errors, the cognitive demands are substantial. However, if they try to minimise cognitive demands, their speech will contain many errors. In practice, speakers mostly engage in extensive planning only when it is relatively undemanding. In sum, “Speech planning processes flexibly adapt to external task goals” (Swets et al., 2013, p. 23).

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 520

28/02/20 4:11 PM



521

Language production

There is a final important point. Since speakers’ advance planning is generally limited in scope, it would seem that they must often engage in incremental planning. In other words, speakers plan only part of the next sentence or utterance and gradually extend and change their plan over time. Brown-Schmidt and Konopka (2015) obtained evidence of incremental planning when participants were required to add new information to their spoken utterance after they had started to speak. Brown-Schmidt and Konopka also found that speakers’ fluency was not impaired by adding new information, suggesting that their initial plan was so flexible it could easily accommodate message revisions.

SPEECH ERRORS

KEY TERMS Spoonerism A speech error in which the initial letter or letters of two words (typically close together) are switched to form two different words. Freudian slip A speech error that reveals the speaker’s (often unconscious) sexual desires.

Our speech is generally accurate and coherent. However, we all make occasional speech errors. There are several kinds of speech errors which can occur at any stage of speech production. Human limitations in processing capacity (e.g., short-term memory) might suggest speech errors would be frequent (Christiansen & Chater, 2016). However, the average person makes a speech error only once every 500 sentences. In spite of their rarity, speech errors are important because they provide insights into the mechanisms underlying speech production. This would not be so if speech errors were random and thus unpredictable. In fact, speech errors are predominantly systematic. The speech errors even of brain-damaged patients are generally similar to the correct words (Dell et al., 2014). As Dell et al. concluded, “Slips [speech errors] are more right than wrong.” Historically, speech errors were typically written down by researchers immediately after being heard. This is limited because many speech errors are undetected (Ferber, 1995). More recently, research has focused on speech errors produced deliberately under laboratory conditions.

Error types There are several types of error additional to those mentioned earlier. One example is the spoonerism which occurs when the initial letter(s) of two words are switched. It was named after the Reverend William Spooner who is credited with several memorable examples (e.g., “You have hissed all my mystery lectures”). Alas, most of his gems resulted from painstaking effort. The Freudian slip is a famous type of error allegedly revealing the speaker’s true sexual desires. In a study by Motley (1980), male participants said out loud pairs of items such as goxi furl and bine foddy. For some participants, the experimenter was a female who was “attractive, personable, very provocatively attired, and seductive in behaviour” (Motley, 1980, p. 140). More spoonerisms (e.g., goxi furl turning into foxy girl) were produced when the participants’ passions were inflamed by this female experimenter. Semantic substitution errors occur when the correct word is replaced by one of similar meaning (e.g., “Where is my tennis bat?” instead of “Where is my tennis racquet?”). In 99% of cases, the substituted and correct words belong to the same part of speech (e.g., nouns), suggesting speakers plan

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 521

28/02/20 4:11 PM

522 Language

the grammatical structure of their next utterance before finding the precise words to insert into it. Morpheme-exchange errors involve inflections or suffixes (e.g., -ed) being attached to the wrong word (e.g., “He has already trunked two packs”). Such errors imply that the positioning of inflections is dealt with by a different process from the one responsible for positioning word stems (e.g., trunk; pack). The word stems are worked out before the inflections are added because the spoken inflections or suffixes are generally altered to fit with the new word stems. For example, the “s” sound in “the forks of a prong” is pronounced in a way appropriate within the word forks but not the original word prongs. Finally, we consider subject-verb agreement errors, in which singular verbs are mistakenly used with plural subjects or vice versa (e.g., “The government have made a mess of things”). Why do we make such errors? McDonald (2008) argued that considerable processing resources are required to avoid subject-verb agreement errors. As predicted, speakers made more such errors when there was an externally imposed load on working memory. Other factors associated with use of singular or plural verbs have been identified. When speakers had recently encountered phrases (e.g., a trio of violinists) paired with plural verbs, they were more likely to produce plural verbs with other, similar phrases (e.g., a class of children) (Haskell et  al., 2010). Mirković and MacDonald (2013) found semantic factors are important. Participants received verbs plus phrases (in Serbian) such as the following: (1) Many wolves . . . (to jump) (2) Several wolves . . . (to jump) In each case, they had to decide whether to say jump or jumps (both grammatically acceptable in Serbian). Mirković and MacDonald argued that many is more suggestive than several of a single mass or collection. As predicted, speakers used plural verbs less often following phrases containing many rather than several. It has generally been assumed most speech errors involve sound substitutions. Goldrick et  al. (2016) showed the limitations of this assumption. Speakers said tongue twisters repeatedly. The sounds associated with speech errors varied considerably – some resembled direct substitutions but most did not. This variability occurred because speech errors were due to a combination of two factors: (1) planning processes relating to the targets of articulation; and (2) articulatory processes specifying the motor movements required to execute this plan. This study is important because it indicates speech errors are multiply determined and so require more complex explanations than those proposed in the past. In sum, speakers make several types of speech errors. We will shortly discuss other types of speech errors within the context of Dell’s (1986) spreading-activation theory, which provides an explanation of most types of speech errors. Before turning to that theory, however, we discuss two prominent theories of how speakers monitor their own speech to prevent (or correct) speech errors. These theories have general importance because

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 522

28/02/20 4:11 PM



Language production

523

they identify mechanisms that serve to minimise the number of errors made by speakers and thus enhance communication (Nozari & Novick, 2017).

Perceptual loop theory Speakers often detect and rapidly correct their own speech errors. Levelt (1983) proposed a perceptual loop theory to explain such error detection. According to this theory, speakers detect their own speech errors by monitoring their utterances and discovering that what they say sometimes differs from what they intended. Of importance, this monitoring occurs at two levels: inner speech and overt speech. With overt speech, speakers make use of auditory feedback. In essence, speakers use the comprehension system to detect their own speech errors in ways resembling those used to detect errors in other people’s speech. Monitoring of inner speech should typically occur faster than monitoring of overt speech. Strong evidence that monitoring for speech errors occurs at the two stages of inner and overt speech was reported by Nooteboom and Quené (2017). Speakers were given a task designed to produce errors. For example, the word pair BIN DOG was presented visually and had to be spoken aloud. When they had previously been presented with word pairs such as DOVE BALL; DEER BACK; and DIM BOMB, they sometimes incorrectly said DIN BOG. Nooteboom and Quené recorded two measures: (1) error-to-cut-off times (time between an error and the speaker stopping speaking); and (2) error-to-repair times (time between an error and the speaker correcting it). What did Nooteboom and Quené (2017) find? First, the error-to-cutoff times showed two peaks (139 ms and 637 ms). The fast times reflect monitoring of inner speech whereas the slow times reflect monitoring of overt speech. Second, the error-to-repair times also showed two peaks (253 ms and 970 ms): repairing or correcting an error was more time-consuming when detected in overt speech rather than inner speech. It is assumed within perceptual loop theory that speakers often use auditory feedback to monitor their own speech for errors. This assumption has been supported by research showing speakers are worse at detecting errors when auditory masking is used to prevent them from hearing themselves speak. However, Nooteboom and Quené (2017) discovered that loud masking noise had no effect on the detection of speech errors. Auditory feedback is probably more important in monitoring when speakers produce fairly long and complex utterances than when they simply produce two words as in Nooteboom and Quené’s study.

Conflict-based monitoring theory Nozari et  al. (2011) proposed a conflict-based monitoring theory. They argued error detection depends on information generated by the speech-­ production system rather than the comprehension system. Their theory assumes speakers engage in conflict monitoring during competition among various possible words at the time of response selection. Cognitive control mechanisms are used to resolve conflicts between response options.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 523

28/02/20 4:11 PM

524 Language

The two theories make different predictions. First, the perceptual loop theory predicts speakers’ success at detecting their own speech errors depends mostly on their comprehension ability. In contrast, Nozari et  al.’s (2011) conflict-based account predicts speakers’ ability to detect their speech errors depends on the quality of their speech-production system. Second, speakers should detect errors rapidly if error detection depends on detecting conflict prior to producing an error (the conflict-based account). According to the perceptual loop theory, in contrast, error detection can be fast or slow depending on whether it is based on monitoring inner or overt speech.

Figure 11.2 Correlations between aphasic patients’ speech-production abilities and their ability to detect their own speechproduction errors. Top row: ability to avoid semantic errors when speaking (s weight) was positively correlated with ability to detect their own semantic errors (left) but not with ability to detect their own phonological errors. Bottom row: ability to avoid phonological errors when speaking (p weight) was not positively correlated with ability to detect their own semantic errors (left) but was positively correlated with ability to detect their own phonological errors (right). From Nozari et al. (2011). Reprinted with permission from Elsevier.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 524

28/02/20 4:11 PM



Language production

525

Findings Nozari et  al. (2011) tested patients with aphasia (various language problems; see Glossary) to decide whether their ability to detect their own speech errors depends more on their comprehension or their speech-production ability. There was practically no correlation between comprehension ability and error-detection performance. However, speech-production ability predicted error detection (see Figure 11.2). Patients making many semantic speech errors were much worse than other patients at detecting their own semantic errors (r = –.59). Those making many phonological speech errors were poorer at detecting their own phonological errors (r = –.43). Blackmer and Mitton (1991) assessed speed of error detection among callers to a radio chat show. Many errors were detected very rapidly (e.g., 19% of overt corrections of what a caller had just said occurred immediately). For example, one caller said “willfiddily” and without any pause added “fully”. In the study by Noteboom and Quené (2017) discussed earlier, 72% of the error-to-cut-off times were fast, which is consistent with conflict-based monitoring theory. Gauvin et  al. (2016) studied the brain regions associated with error detection in speech production and speech perception. Error detection in speech production generally involved brain regions mostly independent of speech-perception systems. In addition, detection of speech-production errors involved a cognitive control mechanism resolving conflict centred on the anterior ­cingulate cortex.

Overall evaluation There is support for the assumption of perceptual loop theory that speakers monitor inner and overt speech. Its further assumption that monitoring of overt speech is based on auditory feedback has also received some support. However, auditory feedback is used less often than implied by the theory. Finally, the assumption that speech monitoring involves comprehension-like processes appears mostly incorrect. Several findings support conflict-based monitoring theory. First, the success of brain-damaged patients in detecting their own speech errors depends largely on their speech-production ability. Second, the finding that speakers often detect speech errors very rapidly suggests speech monitoring occurs within the speech-production system. Third, the brain regions associated with the detection of speech errors are closer to those predicted by this theory than by perceptual loop theory (Gauvin et al., 2016). However, the theory is limited because it de-emphasises the role sometimes played by speech-perception mechanisms in detecting speech-production errors.

THEORIES OF SPEECH PRODUCTION Earlier we mentioned four levels or stages of processing involved in speech production: semantic; syntactic; morphological; and phonological. There is controversy concerning the relationships between these levels or stages of processing in speech production. Two highly influential theories of speech production are Dell’s (1986) spreading-activation theory and Levelt et al.’s

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 525

28/02/20 4:11 PM

526 Language

KEY TERMS Spreading activation Activation of a node (corresponding to a word or concept) in the brain causes some activation to spread to several related nodes or words.

(1999) WEAVER++ model (discussed in detail shortly, pp. 530–534). Below we provide a brief preview of key differences between them. According to Dell’s (1986, 2013) spreading-activation theory, processing can occur in parallel (at the same time) at different levels (e.g., semantic; syntactic). More specifically, it is assumed theoretically that processing is interactive. That means there can be cascade processing (see Glossary) with the initiation of later processing stages occurring prior to the completion of processing at earlier stages. It also means processing can involve bottom-up feedback in addition to the top-down processing characteristic of speech production. In contrast, Levelt et al. (1999) argued serial processing (one process at a time) plays a major role in speech production. They also assumed within their WEAVER++ model that speech production involves a feedforward system with processing occurring in a strictly forward direction (i.e., from meaning to sound). That assumption seems reasonable – it is plausible that speakers decide the meaning they want to communicate before deciding on the appropriate sounds to articulate. In sum, the main assumptions of spreading-activation theory imply speech-production processes are (if not chaotic) then at least very flexible. In contrast, the main assumptions of WEAVER++ imply the processes involved in speech production are regimented and structured. However, the actual theoretical differences are less extreme. Dell (1986) accepted processing at any given point in time is generally more advanced at some levels (e.g., semantic) than others (e.g., phonological). Thus, the notions that initial processing is mainly at the semantic and phonological levels whereas later processing is mostly at the morphological and phonological levels are common to both theories. We will make two final preliminary points about the two theories. First, Goldrick (2006) proposed the compromise position that there is “limited interaction” in speech production. This makes sense – too much interaction would lead to numerous speech errors whereas too little interaction would inhibit speakers’ ability to produce interesting new sentences and new ideas. Second, spreading-activation theory was initially based largely on evidence from speech errors. In contrast, WEAVER++ was based mostly on laboratory studies of the time taken to speak words accurately in different contexts. It is arguable (Goldrick, 2006) that interactive effects are more prevalent in speech-error data than response-time data. Speculatively, the speech-production system may be most efficient when there is minimal ­distraction and interference and so it operates approximately in line with the assumptions of WEAVER++. Finally, speech production does not depend only on language-­specific processes. Some more general processes (e.g., attention; short-term memory) are also important. We discuss relevant theory and research based on this approach at the end of this section (see pp. 535–536).

Spreading-activation theory Unsurprisingly, the notion of spreading activation is central to Dell’s (1986) spreading-activation theory. It is assumed the nodes within a

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 526

28/02/20 4:11 PM



527

Language production

network (nodes correspond to words or concepts) vary in activation or energy. When a node or word is activated, activation or energy spreads from it to other related nodes or words. For example, strong activation of the node corresponding to “tree” may cause some activation of the node corresponding to “plant”. According to the theory, spreading activation occurs for sounds as well as words. The theory assumes there are categorical rules at the semantic, syntactic, morphological and phonological levels of speech production. These rules impose constraints on acceptable categories of items and combinations of categories. The rules define categories appropriate at each level. For example, the categorical rules at the syntactic level specify the syntactic categories of items within a sentence. There is also a lexicon (dictionary) in the form of a connectionist network. It contains nodes for concepts, words, morphemes and phonemes. When a node is activated, it sends activated to all the nodes connected to it (see Chapter 1). Insertion rules select items for inclusion in the representation of the to-be-spoken sentence according to the following criterion: the most activated node belonging to the appropriate category is selected. For example, if the categorical rules at the syntactic level dictate a verb is required at a given point in the sentence, then the verb whose node is most activated will be selected. After selection, the node’s activation level immediately reduces to zero to prevent it from being selected repeatedly. Dell et  al. (2008) focused on why we replace a noun with a noun and a verb with a verb when making speech errors. They argued we possess a “syntactic traffic cop” that monitors what we intend to say and inhibits words outside the appropriate syntactical category. According to the spreading-activation theory, speech errors occur because an incorrect item is sometimes more activated than the correct one. The existence of spreading activation means several nodes are activated at the same time, which increases the likelihood of speech errors. Dell et  al. (2014) discuss the processes responsible for the occurrence of several major speech errors (e.g., anticipatory errors; exchange errors).

KEY TERMS Mixed-error effect A form of speech error in which the incorrect word spoken is related to the correct one in terms of both meaning and sound.

Findings What errors are predicted by spreading-activation theory? First, and of special importance, there is the mixed-error effect, which occurs when an incorrect spoken word is semantically and phonemically related to the correct word. The existence of this effect suggests semantic and phonological factors can both influence word selection at the same time – this is consistent with the notion the various levels of processing interact flexibly. Alternatively, a monitoring system may inhibit the production of words phonologically dissimilar to the intended word (Levelt et al., 1999). Convincing evidence for the mixed-error effect was provided by Ferreira and Griffin (2003). Participants read incomplete sentences (e.g., “I thought that there would still be some cookies left, but there were . . .”) followed by picture naming (e.g., of a priest). In this example, participants tended to produce the wrong word none due to the semantic similarity between priest and nun combining with the phonological identity of nun and none.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 527

28/02/20 4:11 PM

528 Language

KEY TERM Lexical bias effect The tendency for speech errors to form words rather than non-words. Lexicon An individual’s internal dictionary containing information about word meanings.

Second, speech errors typically consist of actual words rather than nonwords (the lexical bias effect). According to the theory, this effect occurs because it is easier for words than non-words to become activated because they have representations in the lexicon (see Glossary). Alternatively, speakers may monitor their inner speech and edit out non-words (monitoring of inner speech was discussed earlier). There is support for both explanations (Dell et al., 2014). Nooteboom and Quené (2008) found evidence for self-monitoring – speakers often corrected themselves just before producing an incorrect word (e.g., SAYING D . . . BARN DOOR when seeing BARN-DOOR). Corley et  al. (2011) asked people to say tongue twisters rapidly. In one condition, all the stimuli were real words compared to only half in the second condition. There was the typical lexical bias effect in the all-word condition, but the effect disappeared in the second condition. How can we explain these findings? Since half the stimuli in the second condition were non-words, speakers did not edit out non-words before speaking. Third, spreading-activation theory predicts speakers should make anti­ cipatory errors in which a speech sound is made too early (e.g., “a Tanadian from Toronto” instead of “a Canadian from Toronto”). This prediction has been confirmed (e.g., Nooteboom & Quené, 2013). Anticipatory errors occur because many sentence words become activated during speech planning and sometimes a later word is more activated than the one that should be spoken. Fourth, many errors should be exchange errors in which two words within a sentence are swapped (e.g., “I must send a wife to my email”). That is, indeed, the case (Nooteboom & Quené, 2013). Remember the activation level of a selected word immediately reduces to zero. If “wife” is selected too early, it is unlikely to be selected in its correct place in the sentence. This allows a previously unselected but highly activated word such as “email” to be spoken in the wrong place. Fifth, anticipation and exchange errors generally involve words moving only a relatively short distance within the sentence. Those words relevant to the part of the sentence under current consideration are generally more activated than those relevant to more distant parts of the sentence. Thus, the findings accord with the predictions of spreading-activation theory.

Evaluation Spreading-activation theory has several strengths: (1) The mixed-error and lexical bias effects indicate processing during speech production can be highly interactive as predicted theoretically. (2) The theory can also explain several other speech errors (e.g., exchange errors; anticipatory errors). (3) The theory’s emphasis on spreading activation provides links between speech production and other cognitive activities (e.g., word recognition: McClelland & Rumelhart, 1981). (4) Our ability to produce novel sentences may depend in part on the flexibility resulting from widespread activation between processing levels assume by the theory.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 528

28/02/20 4:11 PM



Language production

529

(5) Dell’s (1986) original theory was vulnerable to the charge that it predicted too many speech errors. However, Nozari et  al. (2011; discussed earlier, pp. 523–525) improved the theory by adding mechanisms for monitoring and editing out errors early in processing. What are the theory’s limitations? (1) It de-emphasises processes involved in the construction of a message (including its intended meaning). It also fails to consider audience design (how speakers respond to the needs of their audience; discussed later, pp. 544–547). (2) It does not predict the time taken to produce correct and incorrect spoken words. (3) The interactive processes emphasised by the theory are less apparent in error-free than speech-error data (Goldrick, 2006). For example, the processes involved in correct versus incorrect picture naming differ. Principe et  al. (2017) found brain activity within large regions of the parietal and temporal cortex differed between correct and incorrect naming within approximately 100 ms of picture presentation. (4) There is insufficient emphasis in the theory on factors influencing the extent of interactive processes in speech production. There is less interactive processing when overall processing demands are high than when they are low (Mädebach et al., 2011; discussed shortly, p. 533).

Anticipatory and perseveration errors Dell et al. (1997) developed spreading-activation theory. They argued most speech errors belong to two categories: (1) Anticipatory: as discussed earlier, sounds or words spoken ahead of their time (e.g., “caff of coffee” instead of “cup of coffee”). These errors mainly reflect inexpert planning. (2) Perseveratory: sounds of words are spoken later than they should have been (e.g., “beef needle” instead of “beef noodle”). These errors reflect planning failure or a failure to monitor what one is about to say. Suppose we compare speakers engaging in much forward planning with those doing much less forward planning. According to Dell et  al. (1997), those planning ahead should make more anticipatory errors. They focused on the anticipatory proportion (the proportion of total errors ­[anticipation  + perseveration] that is anticipatory) and argued this should correlate positively with speakers’ tendency to plan ahead.

Findings Dell et al. (1997) gave speakers extensive practice at saying tongue twisters (e.g., five frantic fat frogs; thirty-three throbbing thumbs). They argued practice would lead to more forward planning and so increase the anticipatory proportion. As predicted, the anticipatory proportion increased from .37 early in practice to .59 at the end of practice.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 529

28/02/20 4:11 PM

530 Language

KEY TERM Lemmas Abstract words possessing syntactic and semantic features but not phonological ones.

Dell et al. (1997) also argued that requiring participants to speak more rapidly would reduce forward planning and so decrease the anticipatory proportion. This prediction was supported by Vousden and Maylor (2006) in a study in which children and young adults said tongue twisters slowly or fast. Wutzler et al. (2013) assessed the anticipatory proportion in elderly individuals. Those with cognitive impairment had a significantly lower anticipatory proportion than those without, presumably because they were less able to plan their utterances. Fossett et al. (2016) asked speakers to say tongue twisters at three different rates: typical; fast; or faster. The anticipatory proportion was smallest at the typical speaking rate – this is opposite to theoretical prediction. How can we explain the above unexpected finding? Dell et  al. (1997) argued the anticipatory proportion will be greatest at a slow speaking rate because it permits much forward planning. This prediction involves the assumption that words later in the sentence remain activated long enough to produce anticipatory errors. If word activation decays rapidly, then the findings of Fossett et al. (2016) can be explained.

Levelt’s theoretical approach and WEAVER++ Levelt et al. (1999) put forward a computational model called WEAVER++ (WEAVER stands for Word-form Encoding by Activation and VERification). The model focuses on the processes involved in producing individual spoken words and makes the following assumptions: ●●

●●



●●

●●

●●

There is a feedforward activation-spreading network meaning that activation proceeds forwards through the network but not backwards. Of particular importance, processing proceeds from meaning to sound. There are three main levels within the network: (i) At the highest level are nodes representing lexical concepts. (ii)  At the second level are nodes representing lemmas from the mental lexicon. Lemmas are word representations that “are specified syntactically and semantically but not phonologically” (Harley, 2013). Thus, if you know the meaning of a word you are about to say and that is a noun, but you do not know its pronunciation, you have accessed its lemma. (iii) At the lowest level are nodes representing word forms in terms of morphemes (basic units of meaning) and their phonemic segments. Lexical (word) selection depends on a competitive process based on the number of lexical units activated. Speech production following lexical selection involves various processing states following each other in serial fashion (one at a time). Speech errors are avoided by means of a checking mechanism based on the speaker monitoring what they say (discussed earlier, see pp. 523–525).

It is easy to get lost in the model’s complexities (agreed?). In essence, however, the model shows how word production proceeds from meaning (lexical concepts and lemmas) to sound (e.g., phonological words). There is

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 530

28/02/20 4:11 PM



531

Language production

a stage of lexical selection at which a lemma (representing word meaning + syntax) is selected. A given lemma is generally selected because it is more activated than other lemmas. Then there is morphological encoding during which the basic word form of the selected lemma is activated. This is followed by phonological encoding – the word’s syllables are computed. What happens is known as lexicalisation, “the process in speech production whereby we turn the thoughts underlying words into sounds” (Harley, 2013). The most important development of Levelt et  al.’s (1999) approach involved identifying the brain regions associated with the various processes within the model (Indefrey & Levelt, 2004). This development means neuroimaging techniques can be used to test the model’s predictions. In sum, WEAVER++ is a discrete, feedforward model. Processing is discrete (separate), because the speech-production system identifies the correct lemma or abstract word before starting to work out the sound of the selected word. It is feedforward, because processing proceeds in a strictly forward (from meaning to sound) direction.

KEY TERMS Lexicalisation The process of translating a word’s meaning into its sound representation during speech production. Tip-of-the-tongue state The frustrating experience of being unable to find the correct word to describe a given concept or idea.

Findings According to the model, speakers process syntactic (e.g., a noun’s gender) and phonological information sequentially. Bürki et  al. (2016) tested this assumption using event-related potentials (ERPs; see Glossary). As predicted, the evidence from ERPs suggested syntactic and phonological processing occurred at different times. Van Turennout et al. (1998) used ERPs with Dutch participants and found syntactic information about a noun’s gender was available 40 ms before its initial phoneme. This is consistent with Levelt’s theoretical approach. Indefrey (2011) carried out a meta-analysis of studies using ERPs and other techniques to assess brain activation during speech production. The right column of Figure 11.3 provides approximate timings for major speech-production processes. Conceptual preparation takes about 200 ms. After that, lemma retrieval takes 75 ms, and phonological code retrieval takes 20 ms per phoneme and 50–55 ms per syllable. Finally, phonetic encoding with articulation typically starts about 600 ms after the initiation of processing. The left-hand side of Figure 11.3 is colour coded to indicate the various brain regions in the left hemisphere associated with each process. There are limitations with Indefrey’s (2011) meta-analysis. First, most research involved picture naming meaning the focus of the meta-analysis is narrow. Second, the timings are with respect to stimulus presentation. Since speech production is more concerned with action than perception, it would be preferable to focus on how long a given process occurs prior to speech onset. Third, the neat-and-tidy impression created by Figure 11.3 is misleading (see below, p. 532). Recent cognitive neuroscience research relevant to Dell’s and Levelt’s theoretical approaches is discussed in the next section. We can see the distinction between a lemma and the word itself in the tip-of-the-tongue state. The tip-of-the-tongue state occurs when we have a concept or idea in mind (i.e., we have activated the correct lemma or abstract word) but cannot find the appropriate word (i.e., phonological

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 531

28/02/20 4:11 PM

532 Language

Figure 11.3 The right side of the figure indicates the sequence of processes (and their timings) for picture naming. Identical colours on the left side of the figure indicate the brain regions associated with each process (the numbers within regions are the median peak activation times in msec after picture onset). From Indefrey (2011).

processing is unsuccessful). Harley and Bown (1998) found the tip-of-thetongue state was especially frequent for words sounding unlike nearly all other words (e.g., apron; vineyard ) – their unusual phonological forms make them hard to retrieve. Levelt et  al. (1999) claimed the lemma includes syntactic as well as semantic information. Accordingly, individuals in the tip-of-the-tongue state should have access to syntactic information. In many languages (e.g., Italian; German), grammatical gender (e.g., masculine; feminine) is part of the syntactic information relating to nouns. As predicted, Italian participants in the tip-of-the-tongue state for nouns guessed the grammatical gender correctly 85% of the time (Vigliocco et al., 1997). Biedermann et  al. (2008) reported less supportive findings. German speakers guessed the grammatical gender and initial phoneme of nouns when in a tip-of-the-tongue state. Theoretically, access to grammatical gender precedes access to phonological information and so participants should have guessed the first phoneme more often when they had access to accurate gender information. However, that was not the case. Resnik et  al. (2014) used magneto-encephalography (MEG; see Glossary) while individuals were in the tip-of-the-tongue state when trying

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 532

28/02/20 4:11 PM



533

Language production

to identify pictures of celebrities. They predicted brain areas associated with semantic and syntactic processing (i.e., lemma selection) should be activated shortly after picture presentation. In contrast, brain areas associated with motor programming and articulation should be activated shortly before the correct word was produced. The predicted brain areas were activated. However, semantic and motor areas were activated during both time periods suggesting processing was more interactive than assumed within the model. According to WEAVER++, abstract word or lemma selection is completed before phonological information about the word is accessed. In contrast, Dell’s spreading-activation theory assumes phonological processing can start before lemma or word selection is completed. Most evidence is inconsistent with WEAVER++’s prediction. Meyer and Damian (2007) required participants to name target pictures while ignoring distractor pictures. According to WEAVER++, the phonological features of distractor names should not have been activated. However, target pictures were named more slowly when accompanied by phonological related distractors (e.g., wall when ball was the target). In similar fashion, Roux and Bonin (2016) found naming times for a coloured target object (e.g., banana) slowed when a distractor was phonologically related to the target’s colour (e.g., yellow). Mädebach et  al. (2011) found picture naming was slowed in the presence of a phonologically similar distractor word only when the overall processing demands were relatively low. What do these findings mean? Serial processing (as predicted by Levelt) occurred when processing demands were high. In contrast, processing was more interactive (as predicted by Dell) when processing demands were low.

Evaluation WEAVER++ has various successes to its credit. First, the assumption word production involves a series of stages moving from lexical selection to morphological encoding to phonological encoding is reasonably accurate (Indefrey, 2011; however, see next section, pp. 534–535). Second, Levelt’s theoretical approach was important in shifting the balance of research away from speech errors (which are relatively rare) towards precise timing of accurate word-production processes. Third, WEAVER++ is a simple and elegant model making many testable predictions. It is arguably easier to test WEAVER++ than more interactive theories such as Dell’s spreading-­ activation theory. Fourth, lexical or word selection often involves a competitive process as assumed theoretically. What are the limitations with WEAVER++?

Interactive exercise: WEAVER++

(1) It focuses narrowly on processes involved in the production of single words and so several processes involved in planning and producing sentences are de-emphasised. (2) There are many more interactions between different processing levels than assumed within WEAVER++ (see next section, pp. 534–535). As Melinger et al. (2014, p. 676) argued, “It is a plausible hypothesis that the language production system is fundamentally parallel in its operation.”

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 533

28/02/20 4:11 PM

534 Language

(3) The evidence from speech errors (e.g., the mixed-error effect, the lexical bias effect, word-exchange errors, and sound-exchange errors) indicates more parallel processing than predicted by WEAVER++. (4) As Harley (2013) pointed out, “The need for lemmas is [not] strongly motivated by the data. Most of the evidence really only demands a distinction between the semantic and the phonological levels.”

Cognitive neuroscience Earlier we discussed Indefrey’s (2011) meta-analysis of ERP and other studies that provided strong support for WEAVER++. Munding et  al. (2016) reported a similar meta-analysis to Indefrey (2011) based on simple speech-production tasks (e.g., picture naming; reading aloud). However, they included only studies using magneto-encephalography (whereas Indefrey (2011) focused mostly on ERP studies. This is important because MEG has superior spatial resolution. Munding et al.’s (2016) findings are shown in Figure 11.4. First, Levelt et al.’s (1999) assumption that cognitive functions near the top of the figure occur earlier than those further down was supported. Second, as Munding et  al. concluded, “There is a great deal of overlap in diverse functions’ active periods, and substantial numbers of reports of theoretically ‘late’ functions being implicated early” (p. 452). For example, several studies found evidence of articulatory processing approximately 300–500 ms before the appropriate response is produced. Dubarry et  al. (2017) assessed brain activity during a picture naming task. They replicated Munding et  al.’s (2016) findings when using the typical procedure of averaging data across trials. However, they argued the averaging approach can provide “a blurred view of the underlying processes” (p. 415). When they focused at the level of single trials, there was much less evidence of parallel processing. Thus, findings based on averaged data may exaggerate the extent of parallel processing in speech production. In sum, evidence obtained using the techniques of cognitive neuro­ science provides a complex picture. What conclusions can we draw? First,

7

Conceptual prep.

6

Lemma sel.

5 4

Phon. code ret’l

3 Syllabification

Number of activations

Figure 11.4 The timing of activation associated with different cognitive functions (colour indicates the number of studies reporting activity related to a given function).

Cognitive function

Vis. Processing

2 Articulation

1

Self monitoring

0 0

200

400

Time (ms)

600

500

From Munding et al. 2016.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 534

28/02/20 4:11 PM



Language production

535

as Riès (2016, p. 476) noted, “Areas associated with lemma selection are generally active before those involved in phonological code retrieval, which are themselves generally active before those associated with articulation.” Such findings support Levelt et al.’s (1999) theoretical approach. Second, brain areas associated with supposedly “late” processes (e.g., phonological processes) are often active much earlier than predicted by Levelt et al. Perhaps top-down processes anticipate or predict the processes necessary later in speech production.

Speech production: general cognitive processes Fifty years ago, it was often assumed humans possessed “language genes” and that language involves processes very different from those used in other cognitive tasks. Those assumptions have been increasingly rejected. It is now recognised that general processes (e.g., short-term memory; attention; cognitive control) play an important role in language processing (including speech production). Here we will briefly consider some relevant research. As mentioned earlier, Christiansen and Chater (2016) argued the very limited capacity of short-term memory means speakers (and listeners to speech) have to deal with a “now-or-never” bottleneck: “If linguistic information is not processed rapidly, that information is lost for good” (p. 3). How do speakers cope with this bottleneck? According to Christiansen and Chater (2016), children learn to form chunks (see Glossary) in which frequently encountered words are grouped together in long-term memory. As a result, an ever-increasing number of chunks is stored in a “chunkatory”. This reduces processing demands by allowing speakers to focus on retrieving entire phrases (rather than single words) from long-term memory. MacDonald (2016) argued that there are important similarities between utterance planning (transient memory of what is to be spoken) and verbal memory (e.g., immediate serial recall of a word list). Here are some examples: (1) words close to each other are most likely to be exchanged; (2) similar words interfere with each other; (3) there are more errors in long lists/sentences than short ones. These findings suggest the processes involved in speech planning closely resemble those involved in verbal short-term memory. Proficient bilinguals rarely allow words from their unintended language to intrude into their speech. Top-down inhibitory processes play an important role in this achievement (Kroll & Navarro-Torres, 2018). For example, consider a Friulian-Italian bilingual patient (LI) with damage to the frontal cortex and anterior cingulate (brain areas associated with top-down control). He switched into Italian 40% of the time when he was meant to be speaking Friulian and into Friulian 43% of the time when Italian was required (Fabbro et al., 2000). On the picture naming task, participants show interference when a word distractor is presented together with each picture. Inhibitory processes are required to minimise such interference. Patients with damage to the lateral prefrontal cortex (involved in inhibition) have larger interference effects than healthy controls (Piai et al., 2016). Jongman et al. (2017) studied the relationship between sustained attention and picture naming. Individuals with superior levels of sustained

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 535

28/02/20 4:11 PM

536 Language

KEY TERMS Aphasia Severe problems in the comprehension and/or production of language caused by brain damage. Wernicke’s aphasia A form of aphasia involving fluent speech with many content words missing and impaired comprehension. Broca’s aphasia A form of aphasia involving non-fluent speech and grammatical errors.

attention had better picture naming performance than those with low levels. McClain and Goldrick (2018) discussed other research indicating the importance of attentional processes in speech production. In sum, general processes (e.g., attention; short-term memory; inhibition), de-emphasised in spreading-activation theory and WEAVER++, strongly influence speech production. For example, speakers’ use of inhibitory and other general processes may explain why the interactive processes emphasised within spreading-activation theory do not lead to numerous speech errors. In future, it will be important to develop theories indicating how general and language-specific processes combine in speech production.

COGNITIVE NEUROPSYCHOLOGY: SPEECH PRODUCTION The cognitive neuropsychological approach to language started in the nineteenth century. Its focus was on brain-damaged patients with aphasia, a condition involving severe impairments in language comprehension and/or production. Early researchers distinguished between Broca’s and Wernicke’s aphasia. Patients with Broca’s aphasia have slow, non-fluent speech. They also have a poor ability to produce syntactically correct sentences, but their sentence comprehension is relatively intact. Broca’s aphasia is generally assumed to involve BA44 and BA45 in the inferior frontal gyrus (see Figure 11.5). In contrast, patients with Wernicke’s aphasia have fluent and apparently grammatical speech that often lacks meaning. They also have severe problems with speech comprehension. Wernicke’s aphasia is generally assumed to involve the posterior part of BA22 in the superior temporal gyrus (see Figure 11.5). In sum, the traditional approach assumed impaired language production was of central importance in Broca’s aphasia whereas impaired language comprehension was central to Wernicke’s aphasia. Dronkers et al. (2017) reviewed the relevant evidence and reported very limited support for the traditional approach. That approach was much

Figure 11.5 Language-related regions and their connections in the left hemisphere. PMC, premotor cortex; STC, superior temporal cortex; p, posterior. Berwick et al. (2013). Reprinted with permission from Elsevier.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 536

28/02/20 4:11 PM



537

Language production

oversimplified in several ways. First, there is no consensus concerning the scope of Broca’s area or Wernicke’s area! Tremblay and Dick (2016) found the most popular definition of Wernicke’s area by experts was “the posterior part of the superior temporal gyrus and including part of the supramarginal gyrus” (p. 63) endorsed by 26% of respondents. The other respondents endorsed smaller or larger areas. The most popular definition of Broca’s area (endorsed by 50%) was that it consisted of the pars triangularis and pars opercularis with the remaining 50% identifying a smaller or larger area. Second, most aphasic patients have extensive brain damage (Flinker & Knight, 2018). As a result, the role played by Wernicke’s area of Broca’s area (however defined) is hard to establish. However, Flinker et al. (2015) found, by using precisely targeted electrical stimulation of the cortex, that Broca’s area supports articulatory planning but is not directly involved in production of spoken words. Third, “The areas of the brain that support language are far more extensive than Broca or Wernicke could ever have imagined” (Dronkers et al., 2017, p. 750). As we saw earlier, speech production involves general processes (e.g., attention; cognitive control) as well as language-specific processes (McClain & Goldrick, 2018). Language comprehension also involves similar general processes within a “multiple demand” network including several prefrontal areas (Blank & Fedorenko, 2017). Fourth, the finding that Broca’s patients find it much harder than Wernicke’s patients to speak grammatically is more common in Englishspeaking than German- or Italian-speaking patients. English is less inflected than German or Italian (with inflected languages, grammatical changes to nouns and verbs are indicated by changes to the words themselves). As a result, the grammatical limitations of English-speaking patients with Wernicke’s aphasia are less obvious (Dick et al., 2001). Fifth, aphasic patients may have general problems relating to attention and memory in addition to specific language impairments (McNeil et  al., 2010). For example, healthy individuals naming pictures rapidly make errors resembling those of stroke aphasics with semantic impairments (Hodgson & Lambon Ralph, 2008). Thus, picture naming errors by aphasics may partly reflect reduced processing resources or semantic control. In sum, the traditional approach is very limited. Accordingly, the emphasis has shifted towards systematic attempts to understand relatively specific cognitive impairments (see below).

KEY TERM Anomia A condition caused by brain damage in which there is an impaired ability to name objects.

Anomia Nearly all aphasics (regardless of the type of aphasia) suffer from anomia (an impaired ability to name objects). Within Levelt et  al.’s (1999) WEAVER++ model, there are two reasons aphasics might have problems with lexicalisation (translating a word’s meaning into its sound): (1) There could be problems at the semantic level (i.e., selecting the appropriate lemma or abstract word). In that case, naming errors would resemble the correct word in meaning. (2) There could be problems at the phonological level, in which case patients would be unable to find the appropriate form of the word.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 537

28/02/20 4:11 PM

538 Language

KEY TERM Phonological output lexicon It contains information about the spoken form of words (e.g., number of syllables) and is used in object naming and reading aloud.

Findings Laganaro et  al. (2009) divided aphasic patients into semantic and phonological groups based on their main cognitive impairment on various tasks. Both groups were then given a picture naming task and event-related potentials were recorded to assess the time course of processing. The semantic group had ERP abnormalities early (100–250 ms after picture onset). In contrast, the phonological group only had later abnormalities (300–450 ms after picture onset). Thus, anomia can result from an early semantic stage (finding the correct word) or a later phonological stage (generating the word’s phonological form). Patients with anomia having problems at the phonological (but not the semantic) level often resemble healthy individuals in a tip-of-the-tongue state (discussed earlier, see pp. 531–532). As a result, we might expect such patients to experience the tip-of-the-tongue state very frequently. Patients fitting this pattern (and having particular problems in producing low-­ frequency names) have been identified (Funnell et al., 1996). Gvion and Friedmann (2016) clarified the mechanisms involved in patients with anomia who have phonological problems. They focused on the phonological output lexicon, which contains the representation of a word’s spoken form (e.g., number of syllables; consonants; vowels). It is organised by word frequency so high-frequency words are more accessible than low-frequency ones. Gvion and Friedmann found several patients with anomia had an impaired phonological output lexicon. However, these patients performed well on a word-comprehension task suggesting their semantic processing was reasonably intact. Nardo et  al. (2018) studied 18 aphasic patients whose comprehension of spoken words was good. They used a treatment programme focusing on patients’ phonological problems: on a picture naming task, phonemic cues (e.g., initial phoneme of the word) were presented. This treatment produced short-term and long-term improvements in naming ability. Discovering anomia can result from difficulties at the semantic and/ or phonological levels is consistent with serial models (e.g., Levelt et  al.’s WEAVER++). However, it does not rule out interactive models (e.g., Dell’s spreading-activation theory). Soni et  al. (2009) compared these models using a picture naming task in three conditions differing in the cues accompanying each picture: (1) correct cues (e.g., lion + the cue l ); (2) incorrect cues (e.g., lion + the cue t which misleadingly suggests tiger): (3) no cue. According to Levelt’s model, speakers determine the lemma (abstract word) appropriate to the object before using phonological information generated by the cues. Thus, an incorrect phonological cue should not impair performance. Interactive models make the opposite prediction. Word selection can be influenced by phonological activation and this can enhance (or impair) performance depending on whether the cue is correct or incorrect. The findings supported interactive models over serial ones. Soni et  al. (2011) extended their research. Aphasics viewed pictures accompanied by a sound and named the picture. There were four conditions determined by the relationship between picture and sound. Suppose the picture showed a candle. The sound could be l (related category – ­suggests lamp), w (associate word – suggests wax), or neutral (g). Naming

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 538

28/02/20 4:11 PM



539

Language production

performance was worse in these conditions than when the sound was k (suggesting the correct answer). These findings suggest (contrary to Levelt et  al., 1999) that semantic processing is not necessarily complete before phonological processing starts.

Evaluation Much research on anomia is consistent with Levelt et al.’s (1999) notion that problems with word retrieval can occur at two different stages: (1) abstract word (lemma) selection; and (2) accessing the phonological form of the word. However, there is a potential problem with that explanation. There is suggestive evidence (Soni et al., 2009, 2011) for more interaction between semantic and phonological processing than assumed by Levelt et al. (1999).

KEY TERM Agrammatism Literally, “without grammar”; a condition in which speech production lacks grammatical structure and many function words and word endings are omitted; there are often also problems with language comprehension.

Agrammatism It is often assumed (e.g., Dell, 1986) that speaking involves separate stages of working out the syntax or grammatical structure of utterances and then producing content words to fit that grammatical structure. Patients who apparently can find the appropriate words but not order them grammatically suffer from agrammatism (literally “without grammar”). Such patients produce short sentences containing content words (e.g., nouns; verbs) but lacking function words (e.g., the, in, and) and inflections (see Glossary). These omissions are important. For example, function words play a key role in producing a grammatical structure for sentences. As we will see, the agrammatic patients’ problems with syntactic processing often extend to language comprehension as well as speech. Use of the term “agrammatism” implies it forms a syndrome with all agrammatic patients having the same symptoms and with the same brain areas involved in most (or all) cases. It is the case that most agrammatic patients have damage to Broca’s area (BA44/45; see Figure 11.5) (Cappa, 2012). However, agrammatism is not a syndrome – many patients possess only some symptoms typical of agrammatism (Harley, 2013).

Findings Engel et al. (2018) studied language comprehension in patients with agrammatic aphasia presented with sentences such as: (1) The grandma said that the baker cleaned herself with a clean washcloth. (2) The grandma said that the baker cleaned her with a clean washcloth. The above sentences differ only in that the pronoun in (1) is reflexive (i.e., herself  ) whereas the pronoun in (2) (i.e., her) is not. Healthy controls had accurate comprehension of both types of sentences on over 90% of trials. In contrast, agrammatic patients performed much worse on sentences such as (2) (63% vs 90%, respectively). What do the above findings mean? Sentences such as (2) are slightly more complex grammatically. Agrammatic patients were c­onsiderably

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 539

28/02/20 4:11 PM

540 Language

more  affected by this increased complexity than healthy controls, which may reflect reduced computational resources in agrammatic individuals. Faroqui-Shah and Friedman (2015) considered verb tense impairment. It involves a failure to change the forms of verbs to reflect whether the reference is to the past, the present, or the future (e.g., omitting -ed when using a verb in the past tense). Agrammatic individuals had greater verb tense impairment when the task was more demanding (e.g., picture description) than when it was less demanding (e.g., grammaticality judgement). They argued that these findings suggest that agrammatic individuals have reduced computational resources. Beeke et  al. (2007) studied an agrammatic patient in the laboratory (using artificial tasks) and while conversing at home. His speech was more grammatical in the latter situation. Rhys et al. (2013) studied an ­agrammatic patient who used prosodic cues (e.g., stress; intonation) to communicate meanings and grammatical structures despite very limited speech. Christiansen et  al. (2010) also considered whether the deficits of agrammatic patients are specific to language or whether they are more general. They found agrammatic patients with damage to Broca’s area (BA44/45) showed impaired sequence learning as well as grammaticality, indicating their deficits extend beyond language. How can we explain these findings? It seems reasonable that a deficit in sequence learning could lead to the production of ungrammatical sentences given the great dependence of grammaticality on appropriate word order. Uddén et  al. (2017) produced further evidence for the role of Broca’s area in sequence learning. Transcranial magnetic stimulation (TMS; see Glossary) applied to that area indicated it plays a causal role in sequence learning. Griffiths et al. (2013) shed light on brain pathways involved in syntactic processing. They studied patients with damage to a dorsal pathway connecting part of Broca’s area (BA44) with part of Wernicke’s area (middle temporal gyrus) and/or a ventral pathway connecting another part of Broca’s area (BA45) with Wernicke’s area. Participants listened to sentences (e.g., “The woman is hugged by the man”) and then selected a drawing: (1)

Figure 11.6 Semantic errors (left) and syntactic (right) errors made by: healthy controls and patients with no damage to the dorsal (D) or ventral (V) pathway, damage to the ventral pathway only, damage to the dorsal pathway only and damage to both pathways. From Griffiths et al. (2013). By permission of Oxford University Press.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 540

28/02/20 4:11 PM



541

Language production

the correct one; (2) a syntactic distractor (e.g., a woman hugging a man); or (3) a semantic distractor (e.g., a man painting a woman). What did Griffiths et al. (2013) find? Patients with damage to either or both pathways made many more syntactic errors than controls or patients with damage to neither pathway (see Figure 11.6). However, very few semantic errors were made by any of the patient groups and the same was true on other semantic comprehension tasks.

Evaluation Research on agrammatism can be related to Dell’s (1986) identification of four levels (semantic; syntactic; morphological; and phonological) of speech production. More specifically, agrammatics primarily have problems at the syntactic level at which the grammatical structure of a sentence is formed (Dell, 1986). Of theoretical importance, there is accumulating evidence that agrammatic individuals have general deficits as well as language-specific ones. For example, they have impaired sequence learning and reduced processing or computational resources. What are the limitations of research on agrammatism? First, the symptoms of agrammatic patients are too diverse for it to form a syndrome. Second, agrammatic patients may possess more grammatical competence than generally assumed. Third, more research is required to establish the extent to which agrammatism involves language-specific deficits versus more general ones.

KEY TERMS Jargon aphasia A brain-damaged condition in which speech is reasonably correct grammatically but there are severe problems in accessing the appropriate words. Neologisms Made-up words produced by patients suffering from jargon aphasia.

Jargon aphasia Jargon aphasia “is an extreme form of fluent aphasia in which syntax is

primarily intact, but speech is marked by gross word-finding difficulties” (Harley, 2013). This pattern is superficially the opposite to that of patients with agrammatism, who can find the correct content words but cannot produce grammatically correct sentences. Jargon aphasics often produce jargon (including neologisms, which are made-up words, and real words unrelated phonologically to the target word). Finally, jargon aphasics have deficient self-monitoring – they are generally unaware their speech contains numerous errors and become irritated when others fail to understand them. Here is how a jargon aphasic described a picture (Sampson & FaroquiShah, 2011): It’s not a large house, it’s small, unless an awful lot of it goes back behind the hose. They are whiking what they are doing in the front part which must be peeving . . . leeling . . . weeding . . . there is a nicoverit spotole for the changer.

Findings How grammatical is the speech of jargon aphasics? If they engage in syntactic processing, their neologisms or made-up words might possess appropriate prefixes or suffixes to fit the structure of the sentences in which they

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 541

28/02/20 4:11 PM

542 Language

appear. Jargon aphasics generally modify their neologisms in this way (Butterworth, 1985). What factors determine the specific form of jargon aphasics’ neologisms? First, and of most importance, they exhibit a failure of phoneme selection even when the appropriate target word is initially selected. For example, Olson et  al. (2015) studied three jargon aphasic patients who performed naming, repetition, and reading tasks. More specifically, the phonemes from target words were often only weakly activated and so were outcompeted by non-target phonemes. These non-target phonemes were often phonologically related to target phonemes or were phonemes used recently. Pilkington et  al. (2017) analysed the neologisms produced by 25 patients with jargon aphasia. Those produced by 23 of these patients were related phonologically to the target (intended) word. These findings also indicate a major role for deficient phonological encoding in the production of neologisms. If impaired phoneme selection is responsible for patients’ impaired speech production, therapy designed to enhance phonemic processing might prove beneficial. Bose (2013) found that therapy focused on generating and analysing phonological features of words reduced the number of neologisms produced by FF, a man with jargon aphasia. Second, deficient self-monitoring plays a major role in jargon aphasia. Sampson and Faroqui-Shah (2008) obtained a negative correlation between self-monitoring and the production of jargon in jargon aphasics. Eaton et  al. (2011) studied a male jargon aphasic (TK) over a 21-month period. His improved word naming performance over time correlated highly with increased self-monitoring suggesting inadequate self-monitoring played a role in TK’s initially poor performance. Third, the production of jargon by jargon aphasics is sometimes influenced by impaired general cognitive abilities. For example, Robinson et al. (2015) studied a male jargon aphasic (JA) who produced almost meaningless sentences. He had impaired cognitive control and so he had no “brake” inhibiting production of meaningless phrases.

Evaluation Several factors responsible for jargon aphasics’ production of neologisms and other jargon have been identified. Impaired phoneme selection is typically of most importance. Some evidence indicates that jargon aphasics often have impaired cognitive or inhibitory control (linked to deficient self-monitoring) as well as more language-specific deficits. A limitation of much research is that insufficient attention has been paid to possible semantic deficits in addition to problems with phoneme selection (Harley, 2013). More research is also needed to assess more precisely the grammaticality (or otherwise) of jargon aphasics’ spoken utterances.

Conclusions Most theories of speech production are based on the assumption, “There are independent levels of representation/processing that encode word meaning

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 542

28/02/20 4:11 PM



Language production

543

(semantics), word form (phonology), and grammatical structure (syntax)” (McClain & Goldrick, 2018, p. 398). The evidence based on brain-damaged patients provides support for this overarching assumption. In this section, we have focused on the patterns of language impairment in patients categorised as suffering from anomia, agrammatism or jargon aphasia. Such categorisations mistakenly imply all patients assigned to a given category have very similar language impairments. Mirman et al. (2015) avoided the use of categories. They used several tasks to assess various aspects of word recognition and production in 99 aphasic patients and also assessed the brain areas damaged in those patients. Mirman et  al. (2015) used a statistical technique known as factor analysis to identify the major components underlying patients’ patterns of impaired language performance. There was a major division between semantic and phonological processing and each form of processing was subdivided into language perception and language production. In future, this approach could potentially allow us to move away from an overreliance on categories or syndromes to a focus on similarities and differences among aphasic. Another general conclusion is that the speech-production problems of many aphasics (e.g., agrammatic individuals; jargon aphasics) depend in part on general cognitive processes as well as on language-specific ones. Evidence that general cognitive processes are important in the speech production of healthy individuals was reported by Zhang et  al. (2018) using a picture naming task. Their key finding was that successfully coping with increased task difficulty involved additional activation in language-­specific brain areas and general cognitive control (e.g., inhibition) areas. The role of general processes in language processing is discussed further in the introductory section entitled Part III: Language.

SPEECH AS COMMUNICATION Most theories and research discussed so far focus on monologue. In the real world, however, our speech nearly always occurs as conversation in a social context. Dialogue involves various complexities: “Both interlocutors [individuals involved in a conversation] must simultaneously produce their own contributions and comprehend the other’s contribution” (Pickering & Garrod, 2013, p. 330). Thus, as discussed earlier, speech production and speech comprehension are interwoven (Meyer et al., 2016). Grice (e.g., 1975) considered the requirements of successful communication in his cooperative principle: “Make your conversational contribution such as is required . . . by the accepted purpose or direction of the talk exchange in which you are engaged” (Grice, 1989, p. 88). Grice’s use of the term “cooperation” was narrower than its common usage (Davies, 2007). It should be seen in the context of his four maxims speakers should heed: ●●

●● ●●

Maxim of relevance: the speaker should say things relevant to the situation. Maxim of quantity: the speaker should be as informative as necessary. Maxim of quality: the speaker should be truthful.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 543

28/02/20 4:11 PM

544 Language Figure 11.7 A sample array with six different garments coloured blue or green. From Tarenskeen et al. (2015).

●●

Maxim of manner: the speaker should make their contribution easy to understand.

There are two issues with Grice’s approach. First, is unclear we need four maxims – the other three maxims are implied by the maxim of relevance. Second, in the real world, many individuals (e.g., secondhand car salespersons; politicians) ignore one or more of the maxims out of self-interest. For example, an analysis of Donald Trump’s statements during 2015 revealed that he lied (failed to adhere to the maxim of quality) 76% of the time (Holan, 2015). Speakers (even when not guided by self-interest) often fail to adhere to Grice’s four maxims. Suppose a speaker tries to provide enough information so a listener can identify the target item of clothing from an array (see Figure 11.7). Speakers included colour in their statements on 79% of trials even though it was unnecessary and so represented overspecification not adhering to the maxim of quantity (Tarenskeen et al., 2015). Tarenskeen et  al. (2015) carried out another experiment that included many trials where it was necessary for speakers to include colour, pattern or size in their statements. As a result, each attribute was included in their statements on 70% or more of trials when it was not necessary. This widespread overspecification was probably due to the speakers’ desire to be consistent in their statements. In sum, speakers often fail to adhere to Grice’s maxims. However, overspecification (unlike underspecification) typically has no negative effects on listeners’ comprehension and so is essentially harmless.

KEY TERM Audience design This involves speakers tailoring what they say to the specific needs and knowledge of their audience.

Audience design There has been a dramatic increase in research focusing on what speakers say and how they say it when addressing one or more listeners. Much of this research focuses on audience design, which “refers to the situation in which speakers fashion their utterances so as to cater to the needs of their addressees” (Ferreira, 2019). For example, communication can be facilitated by establishing and extending common ground (see Chapter 9).

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 544

28/02/20 4:12 PM



Language production

545

Common ground consists of “knowledge that is shared with a communication partner, and that the communication partners know each other know” (Brown-Schmidt & Duff, 2016, pp. 722–723). Common ground can include several kinds of information (e.g., objects or events visible to both partners; shared cultural values; shared experiences). Ferreira (2019) identified two broad forms of audience design. First, there is a simple form based on general characteristics of the listener. For example, a speaker will typically plan shorter and simpler sentences when talking to a child rather than an adult: this is child-directed speech (see Glossary). This form of audience design typically makes modest demands on processing effort. Second, there is a more complex form based on idiosyncratic characteristics of the listener and/or the context. This can involve considerable processing effort (e.g., taking full account of common ground) and alterations to planned utterances (e.g., you are talking to someone and they start looking at their mobile phone).

Theories Horton and Keysar (1996) proposed their monitoring and adjustment model to account for speakers’ successful (and unsuccessful) use of common ground. Speakers initially plan their utterances using information available to them without considering the listener’s perspective or knowledge. These plans are then monitored and corrected to incorporate common ground. However, it is often computationally hard for speakers to focus on the listener’s perspective while planning what to say. Accordingly, they often egocentrically focus on their own perspective and ignore the common ground. Ferreira (2019) developed the above ideas in his forward modelling approach (see Figure 11.8). Speakers use their communicative intention (i.e., what they want to say) to generate utterances (left-hand side of the figure). They also often produce a forward model (including a model of audience comprehension) to predict the likely effect on the audience of generating those utterances. Of crucial importance, if the predictive communicative effect mismatches the speaker’s intent, their message is changed to reduce the mismatch. This entire process is typically cognitively demanding and so speakers sometimes lack sufficient resources to produce a forward model plus a model of audience comprehension. Memory plays an important role in audience design. For example, Horton and Gerrig (2016) argued that memory limitations mean speakers sometimes assume less common ground than is actually the case. Suppose you have told very few friends about a recent event. As a result, you mistakenly assume the friend you are talking to does not share that common ground. The opposite can also happen: you have told nearly all your friends about an event and mistakenly assume the one with whom you are currently talking was among them.

Findings Effective use of common ground occurs most often when the listener’s knowledge and needs are readily accessible. Achim et  al. (2017) asked

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 545

28/02/20 4:12 PM

546 Language Figure 11.8 Architecture of the forward modelling approach to explaining audience design effects (blue rectangles = representations; orange ovals = processes).

Communicative intention

Evaluator

Message encoding Executive control

From Ferreira (2019). Message

Forward model Grammatical encoding

Words, structures

Phonetic encoding

Communicatively relevant linguistic features

Predicted communicative effect

Model of audience comprehension Representations

Pre-articulatory utterance

Processes

speakers to introduce and subsequently reintroduce movie characters likely (e.g., Harry Potter) or unlikely (Martin Riggs) to be known to the listener. Speakers used each character’s name much more often when introducing and reintroducing known characters. They thus made effective use of common ground because it was easy. Craycraft and Brown-Schmidt (2018) tested the hypothesis that speakers only assume they have formed common ground with listeners when those listeners appear to be attentive. As predicted, speakers who had communicated information to an inattentive listener (e.g., glancing repeatedly around the room; pulling out their mobile phone) did not subsequently assume they shared common ground with their listener. Thus, speakers are often responsive to the listener’s attentional state. In contrast, Fukumura and van Gompel (2012) found speakers often ignored listeners’ knowledge. Speakers typically use a noun phrase (e.g., “the red desk”) the first time an object is mentioned but a pronoun (e.g., “it”) in the next sentence. However, it is only appropriate to use a pronoun in the second sentence provided the listener has heard the previous sentence. In fact, speakers generally used a pronoun in the second sentence even when the listener had not heard the previous sentence, thus ignoring audience design. According to the monitoring and adjustment model and Ferreira’s (2019) forward modelling approach, it is often cognitively demanding to take account of the listener’s perspective. Accordingly, we might expect speakers with superior cognitive abilities to make more use of common ground. As predicted, Long et  al. (2018) found that speakers high in the executive functions of inhibition and switching (see Chapter 6) used common ground more often than low scorers.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 546

28/02/20 4:12 PM



Language production

547

Further evidence that cognitive demands are important in determining whether speakers take account of the common ground they share with their listener was reported by Horton and Keysar (1996). Speakers took less account of common ground when they spoke under time pressure than when they had as much time available as they wanted. According to Horton and Gerrig (2016; mentioned earlier, p. 545), speakers’ memory failures often cause errors in the use of common ground. Empirical support was obtained previously by Horton and Gerrig (2005), who analysed numerous spontaneous telephone conversations. Here is an example of a speaker assuming too little common ground: A: This one guy with- who was like a a fresh br- breeze blown through the factory uh uh uh twenty-four twenty-five-year-old guy B: Oh, yeah you mentioned him. [p. 19] Here is an example of a speaker assuming too much common ground: A: Yeah okay. I told you about the shampoo did I tell you? B: What shampoo no. [p. 19] This example is especially interesting because it suggests the speaker was  monitoring the accuracy of their memory and begins to doubt its accuracy. Rubin et  al. (2011) obtained strong evidence of the importance of memory. Amnesic patients with severely impaired episodic memory (see Glossary) made far less use of information about common ground than healthy controls. However, amnesic patients made reasonably effective use of common ground when the relevant information was readily available. Further evidence that amnesic speakers can often communicate effectively despite their memory problems was reported by Yoon et  al. (2017). Amnesic speakers provided shorter descriptions of objects to listeners to whom they had previously described those objects than to new listeners. Thus, they were sensitive to the presence or absence of common ground with their listener. This happened because the amnesic speakers could identify the listener as new or old. Finally, speakers use various simple and relatively effortless strategies to facilitate communication. One example is syntactic priming (copying a previously heard syntactic structure; see p. 518). Jaeger and Snider (2013) found speakers were most likely to show syntactic priming when the last sentence they heard contained an unexpected syntactic structure. Speakers strive to “get on the same wavelength” as the person with whom they are speaking and this helps to achieve that goal.

Gesture Most speakers use gestures. It is generally assumed they do this because they believe it increases their ability to communicate with their listener(s). This belief is correct (see Chapter 9). Human communication probably depended on gestures in our ancestral past and it was only much later that vocalisation emerged. The fact

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 547

28/02/20 4:12 PM

548 Language

that primate gestures resemble human language much more closely than do primate vocalisations supports that viewpoint (Cartmill et al., 2012). It seems reasonable to assume speakers use gestures more often when they can see the person to whom they are speaking. Surprisingly, this finding has been obtained in only 50% of studies (Bavelas & Healing, 2013). Why do speakers use gestures that cannot be seen by their listeners? Gestures make it easier for speakers to communicate effectively. FrickHorbury and Guttentag (1998) presented participants with the definitions of relatively uncommon words (e.g., tambourine) and asked them to say the word defined. When it was hard to use gestures, speakers produced 21% fewer words than when they were free to use gestures. Gerwing and Allison (2011) asked speakers to describe an elaborate dress to a visible or non-visible listener. The number of gestures was comparable in both conditions. However, speakers’ gestures were much more informative in the face-to-face situation. In that situation, 74% of the information communicated was via gestures and only 26% via speech. In contrast, only 27% of the information communicated in the telephone situation was gestural. Gestures are often modified to take account of the common ground between speakers and listeners. Hilliard and Cook (2016) used a task where speakers communicated information about how to solve a complex problem. When the common ground between speakers and listeners was limited, speakers used more informative gestures than when the common ground was more extensive. However, speakers’ spoken language was not influenced by the extent of common ground. Gesture and speech provide a unified system for communication and so either gesture or speech can be modified to take account of common ground. How responsive are speakers to listeners’ feedback? Holler and Wilkin (2011) compared speakers’ gestures before and after listener feedback. There were two main findings: (1) The number of gestures reduced when the listener indicated understanding of what had been said. (2) Feedback encouraging clarification, elaboration or correction was ­followed by more precise, larger or more visually prominent gestures. In sum, gestures are an important accompaniment to speech. Speakers use gestures because they make it easier to work out what they want to say, In addition, the fact that speakers are responsive to listeners’ needs (including the feedback they provide) means gestures facilitate communication (see Chapter 9).

Prosodic cues Some information communicated by speakers to listeners does not depend directly on the words themselves but rather on how those words are uttered. This is prosody, which describes “systematic modifications to the way that speakers utter words in order to specify or disambiguate the meaning of an utterance” (Cvejic et al., 2012, p. 442).

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 548

28/02/20 4:12 PM



549

Language production

Prosodic cues include rhythm, stress and intonation. For example, in the ambiguous sentence, “The old men and women sat on the bench”, the women may be or may not be old. If the women are not old, the spoken duration of the word “men” should be relatively long and the stressed syllable in “women” will have a steep rise in pitch contour. Neither prosodic feature will be present if the sentence means the women are old. Evidence that listeners’ comprehension of speech is enhanced by prosodic cues is discussed in Chapter 9. Gueliaï et  al. (2014) found gestures accompanying speech carry prosodic information. When listeners heard ambiguous sentences with a mismatch between the prosodic cues in the speech and the gestures, listeners more often understood these sentences in line with the gestural information.

Discourse markers Speakers can also enhance listener comprehension by using discourse markers. Discourse markers are words or phrases assisting communication even though they are only indirectly relevant to the speaker’s message. Among the ways they do this are by expressing the speaker’s attitude and facilitating turn-taking in conversations (Hata, 2016). Speakers use the discourse marker you know, for example, to check whether listeners understand them and to connect with them. We are often unaware of the reasons why we use various discourse markers. For example, what determines whether you say oh or so when moving on to a new conversational topic? Bolden (2006) found oh was used 98.5% of the time when the new topic directly concerned the speaker. In contrast, so was used 96% of the time when the new topic was of most relevance to the listener. In sum, discourse markers often make it easier for listeners to understand speakers’ intended meanings. However, we can also consider discourse markers in the context of disfluencies (e.g., pauses; repetitions). Crible (2017) analysed 15 hours of speech containing a total of 161,700 words and found frequent clusters including disfluencies and discourse markers. She concluded that discourse markers are often simply disfluencies.

KEY TERMS Prosodic cues Features of spoken language such as stress, intonation, pauses and duration making it easier for listeners to work out grammatical structure and meaning; similar cues are often present in texts (e.g., commas; semi-colons). Discourse markers Spoken words and phrases that do not contribute directly to the content of what is being said but still serve various functions (e.g., clarifying the speaker’s intentions).

Research activity: Discourse markers

WRITING: THE MAIN PROCESSES Writing is an important topic in its own right (no pun intended!). However, it is not separate from other cognitive activities. As Kellogg and Whiteford (2012, p. 111) pointed out, Composing extended texts is . . . a severe test of memory, language, and thinking ability. It depends on the rapid retrieval of domain-­ specific knowledge about the topic from long-term memory. It depends on a high degree of verbal ability . . . It depends on the ability to think clearly. Unsurprisingly, writing ability is positively correlated with several aspects of cognitive ability (e.g., fluid reasoning ability) (Cormier et al., 2016).

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 549

28/02/20 4:12 PM

550 Language

IN THE REAL WORLD: EFFECTS OF ALZHEIMER’S DISEASE ON NOVEL WRITING We saw earlier that mild cognitive impairment (often a precursor to Alzheimer’s disease – see Glossary) is associated with various problems in speech production. In view of the cognitive complexities of effective writing, it seems probable mild cognitive impairment would also impair writing performance. Research has been carried out on Iris Murdoch (1919–1999), the renowned Irish novelist who was diagnosed with Alzheimer’s disease. Garrard et  al. (2005) compared her first published work, a novel written during her prime, and her final novel. Iris Murdoch’s vocabulary became less sophisticated (e.g., smaller vocabulary; more common words) across these three works but changes in syntax were less clear- Iris Murdoch, the Irish novelist. cut. Subsequent research by Pakhomov et  al. Ulf Andersen/Getty Images. (2011) showed the syntactic complexity of Iris Murdoch’s writing decreased over time. Thus, aspects of Iris Murdoch’s writing were adversely affected several years before she was diagnosed with Alzheimer’s disease during a period in which she probably had mild cognitive impairment. Le et al. (2011) carried out a detailed longitudinal analysis of the writings of Iris Murdoch, Agatha Christie (suspected of having Alzheimer’s disease towards the end of her life) and P.D. James (a novelist with no signs of cognitive impairment or Alzheimer’s disease). They confirmed previous findings that there were signs of impairment in Iris Murdoch’s writing a considerable time before she was diagnosed with Alzheimer’s disease. Le et  al. (2011) claimed Agatha Christie’s last novels indicated she probably suffered the onset of Alzheimer’s disease. The writing impairments of Agatha Christie and Iris Murdoch both involved vocabulary much more than syntax. More specifically, they both showed a sharp decrease in vocabulary size, increased repetition of phrases, and irrelevant filler words or phrases. Van Velzen et  al. (2014) reported a detailed comparison of the writings of Iris Murdoch and Agatha Christie focusing on their lexical diversity (richness of vocabulary). Iris Murdoch had a much earlier and more sudden reduction in lexical diversity than Agatha Christie. They concluded Agatha Christie did not have Alzheimer’s disease but rather some other neurodegenerative condition. In contrast, P.D.  James showed only marginal writing impairments in old age due to normal ageing. In sum, there are detectable impairments in the writing of novelists probably suffering from mild cognitive impairment. These impairments can provide an early indication of Alzheimer’s disease  (or other neurodegenerative diseases), and probably reflect in part the cognitive complexity of writing. However, even the onset of Alzheimer’s disease has only modest effects on syntax. Thus, cognitive impairment affects the content of what is written more than its grammatical structure.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 550

28/02/20 4:12 PM



Language production

551

Key processes Writing extended texts involves several processes. In spite of minor disagreements about the number and nature of these processes, most theorists agree with Hayes and Flower (1986) that writing involves the following three processes: (1) A planning process which involves producing ideas and organising them to satisfy the writer’s goals. (2) A sentence-generation process which involves turning the writing plan into the actual production of sentences. (3) A revision process which involves evaluating what has been written or word processed so far and changing it when necessary. Chenoweth and Hayes (2003) developed the above approach. Their model identifies four processes: (1) Proposer: it proposes ideas for expression and is engaged in h ­ igher-level planning processes. (2) Translator: it converts the message formed by the proposer into word strings (e.g., sentences). (3) Transcriber: it converts the word strings into written or word processed text. (4) Evaluator/reviser: it monitors and evaluates what has been produced and engages in revision of deficiencies. The main difference between the two approaches described above is that Chenoweth and Hayes (2003) added a transcriber. Why did they do that? Hayes and Flower assumed transcribing (writing sentences already composed) requires minimal processing resources and so had no impact on other writing processes. However, that assumption is incorrect. Hayes and Chenoweth (2006) asked participants to transcribe or copy texts from one computer window to another. They transcribed more slowly when performing a very simple task (saying tap repeatedly) at the same time. Tindle and Longstaff (2016) found the task of writing down heard words utilised ­working-memory resources. The latest version of the above writing model (Hayes, 2012; shown in Figure 11.9) incorporates the four writing processes identified by Chenoweth and Hayes (2003). However, it is more comprehensive because the process level has been expanded to include the task environment and there are additional control and resource levels. Of importance, the current version includes motivation as a factor. Writing effectively is so demanding (as the authors of this book know to their cost!) that high motivation is required to engage in prolonged evaluation and revision of what has been written. There is a final point. The “natural” sequence of the four main writing processes is as follows: proposer; translator; transcriber; and evaluator. As we will see, however, writers surprisingly often deviate from this sequence, switching rapidly between processes.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 551

28/02/20 4:12 PM

552 Language Figure 11.9 Hayes’ (2012) writing model. It consists of three levels: (1) control level (including motivation and goal setting); (2) writing process level (including proposer, evaluator, translator and transcriber); and (3) resource level (including working memory, attention and long-term memory). From Hayes (2012). Reprinted by permission of SAGE Publications.

Findings

KEY TERM Directed retrospection A technique in which individuals (e.g., writers) categorise their immediately preceding thoughts.

Pauses account for over half of writing time. Medimorec and Risko (2017) analysed the pause data of students writing narrative essays (about a memorable day) and argumentative essays (about mobile-phone use in schools). Pauses occurred most often at paragraph boundaries, followed by sentence boundaries, suggesting they often indicate planning processes. Limpo and Alves (2018) studied key aspects of writing dynamics using the triple-task technique. Participants engaged in writing an argumentative text (about controversial university initiation rites) were asked to respond as rapidly as possible to occasional auditory beeps. After they had responded, they indicated which writing process they had just been using: planning, translating or revising. This is known as directed retrospection. What did Limpo and Alves (2018) find? First, time spent on planning reduced during the course of writing (see Figure 11.10). Second, the time spent on the revising process increased over time. Third, all three writing processes (i.e., planning, translating and revising) occurred during all phases of the writing process. Fourth, reaction times to the occasional beeps were slowed most during revising and least during translating. These findings indicate that revising was the most cognitively demanding process, followed by planning and translating in that order. Beauvais et  al. (2011) found writers switched rapidly between different processes: 8 times a minute with narrative texts (telling a story) and 6 times a minute with argumentative texts (discussing ideas). Each episode of translating lasted on average 16 seconds (narrative text) or 17 seconds (argumentative text), planning 8 seconds (narrative text) or 12 seconds (argumentative text), and revision (4 seconds for both texts). Thus, as Levy and Ransdell (1995) found, episodes of planning and revising were shorter than translating or text generation.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 552

28/02/20 4:12 PM



Language production

Figure 11.10 The frequency of three major writing processes (planning, translating and revising) across the three phases (thirds) of writing.

No planning

9 8 Number of occurrences

553

7 6 5

Planning

4

Translating

3

Revising

From Limpo and Alves (2018). Reprinted with permission of Elsevier.

2 1 Phase 1

Phase 2

Phase 3

Writing phase

How do writers make decisions about switching processes? Hayes and Flower (1980) argued writers have a monitor (closely resembling the central executive component of the working memory model: see Chapter 6) controlling their processing activities. Two functions of the central executive are to switch attention between tasks and to inhibit unwanted responses. If the monitor requires working memory resources, it should be less likely to trigger a switch in the current task when current processing demands are high. Quinlan et  al. (2012) tested this assumption. Writers chose whether to complete a sentence before correcting an error or to interrupt sentence composing to focus on the error. Nearly all participants completed the sentence first (especially when total processing demands were high).

Evaluation Processes such as planning, sentence generation and revision are all crucial in writing. However, they cannot be neatly separated because writers typically move rapidly between them. Writers probably possess a monitor initiating processing shifts when overall processing demands are relatively low. What are the limitations of research in this area? First, the factors determining when writers shift processes are mostly unknown. Second, the social aspect of writing (i.e., taking account of the intended readership of written texts) is often de-emphasised (see below). Third, the ways writing processes interact are not specified with precision. For example, Hayes (2012; Figure 11.8) failed to indicate how the four resources relate to writing processes.

Individual differences in writing: development of expertise Why are some writers more skilful than others? As with any complex cognitive skill, extensive deliberate practice is essential (see Chapter 12). In the next section, we will see the working memory system (see Chapter 6)

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 553

28/02/20 4:12 PM

554 Language

Figure 11.11 Kellogg’s three-stage theory of the development of writing skill. From Kellogg (2008). Reprinted with permission of the Journal of Writing Research www.jowr. org.

is very important. All its components have limited capacity. However, the demands of writing on these components decrease with practice, which provides experienced writers with spare processing capacity to enhance their writing quality. Writing expertise often depends on reading ability. Berninger and Abbott (2010) assessed the four main language skills in children. Overall, writing performance was predicted best by reading comprehension, followed by speech production and then speech comprehension. Kent and Wanzek (2016) conducted a meta-analysis, also finding reading comprehension predicted writing quality better than did speech-production ability. How can we explain the above findings? Reading allows writers to learn much about the structure and style of good writing; it also enhances their  vocabulary and knowledge. However, most evidence is correlational  and so does not prove writing ability is caused by reading comprehension. For example, writing expertise may enhance reading comprehension skills. Bereiter and Scardamalia (1987) identified two major strategies writers use. First, there is the knowledge-telling strategy: writers simply write down all they know about a topic with minimal planning. Second, the more complex knowledge-transforming strategy involves working out how to achieve the writing goals and how to decide on the specific information to write down. Optimal use of this strategy involves moving backwards and forwards between these two aims. Kellogg and Whiteford (2012) developed the above approach (see Figure 11.11). They argued that really expert writers move beyond the knowledge-transforming strategy by using a knowledge-crafting strategy.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 554

28/02/20 4:12 PM



555

Language production

With this strategy “The writer shapes what to say and how to say it with the potential reader fully in mind. The writer tries to anticipate different ways that the reader might interpret the text and takes these into account in revising it” (Kellogg & Whiteford, 2012, p. 116). One reason why knowledge crafting is important is because of the knowledge effect – writers often assume other people share the knowledge they possess. Hayes and Bajzek (2008) found individuals familiar with technical terms greatly overestimated other people’s knowledge of these terms (are the authors of this book guilty of this?).

KEY TERM Knowledge effect The tendency to assume others possess the same knowledge as us.

Findings The knowledge-transforming strategy is effective for various reasons. Writers using this strategy produce more high-level main points capturing important themes (Bereiter et  al., 1988) and show more extensive interactions between planning, language generation and reviewing. This strategy is used more effectively if writers enhance their relevant knowledge prior to writing an essay (Chuy et al., 2012). The resultant essays were more coherent and easier to read. Expert writers spend more time revising than non-expert ones. Levy and Ransdell (1995) found writers producing the best essays spent 40% more time reviewing and revising their essays than those producing the worst essays. In addition, expert writers detect many more problems in a text than non-experts (Hayes et al., 1985). Evidence that knowledge-crafting skills are important was reported by Karlen (2017). Students were required to write an academic paper. Those possessing the most knowledge about how to craft texts made greatest use of knowledge-crafting strategies while writing and this in turn enhanced the quality of their writing. Knowledge-crafting skills can be trained. For example, responsiveness to the reader’s needs can be improved by providing writers with feedback from readers about comprehension problems they had experienced (Sato & Matshushima, 2006). Wischgoll (2016) found students developing their knowledge-crafting skills by a meta-cognitive strategy (e.g., “Can I comprehend my text if I read it from the reader’s perspective?”, p. 7) showed greater enhancement of their writing skills than those using other strategies. Finally, the requirements of expert writing depend on the type of text (e.g., an advanced textbook vs a children’s story). Beauvais et  al. (2011) found moderately expert writers engaged in more knowledge-telling when producing narrative rather than argumentative texts. However, the opposite was the case for knowledge-transforming. In other words, the writers tailored their writing behaviour to the requirements of the type of text they were producing.

Research activity: Knowledge telling

Working memory Most people find writing difficult and effortful because it involves several different cognitive processes (e.g., attention; thinking; memory). Several theorists have argued writers make extensive use of working memory to deal with these complexities. This argument was supported by Tindle and

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 555

28/02/20 4:12 PM

556 Language

Case study: Working memory components

Longstaff (2015). They found writing made more demands on working memory than reading or listening. Working memory is used when a task requires temporary storage of some information while other information is processed (see Chapter 6). That is clearly the case with writing – writers have to remember what they have just written while planning what they are going to write next. The key component of the working memory system is the central executive (see Glossary), an attention-like process involved in organising and coordinating cognitive activities. Other components of the working memory system are the visuo-spatial sketchpad (involved in visual and spatial processing) and the phonological loop (involved in verbal rehearsal). All these components have limited capacity. This can easily cause problems with the writing process, which is often very cognitively demanding. All components of working memory are involved in writing. Kellogg (2001) linked these components to five processes involved in writing (see Table 11.1). These processes overlap with those identified by Chenoweth and Hayes (2003; described earlier, pp. 551–552). Planning corresponds to the proposer, translating to the translator, programming is part of the transcriber, and reading and editing together relate to the evaluator/reviser (reading involves going back over what has been written so far). Kellogg et  al. (2013) reconsidered the information contained in Table  11.1 and decided the phonological loop is actually involved in the editing process. Research by Hayes and Chenoweth (2006; discussed earlier, p. 551) showed error correction while copying text was slowed down when the phonological loop was required for another task.

Findings As you can see in Table 11.1, Kellogg (2001) assumed writing performance depends more on the central executive than any other working memory component. We can assess its involvement by measuring reaction times to auditory beeps presented in isolation (control condition) or while people are engaged in writing. If writing uses much of the central executive’s capacity, reaction times should be longer during writing. In a study discussed earlier (Limpo & Alves, 2018), planning, translating and revising all slowed reaction times (especially planning).

TABLE 11.1  INVOLVEMENT OF WORKING MEMORY COMPONENTS IN VARIOUS WRITING PROCESSES Process

Visuo-spatial sketchpad

Central executive

Phonological loop

Planning

yes

yes



Translating



yes

yes

Programming



yes



Reading



yes

yes

Editing

yes





Source: Based on Kellogg (2001).

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 556

28/02/20 4:12 PM



The role of the central executive can also be assessed using an individual differences approach. Vanderberg and Swanson (2007) adopted this approach. Students wrote stories and their performance was divided into general skills (e.g., planning, translating, revision) and specific skills (e.g., grammar, punctuation). Individuals having the most effective central executive functioning exhibited the greatest general and specific skills. In contrast, individual differences in the functioning of the visuo-spatial sketchpad and phonological loop did not influence performance. Guan et  al. (2014) studied the effects of individual differences in working memory capacity (an approximate measure of central executive functioning; see Glossary). Essay-writing quality was predicted by working memory capacity. In similar fashion, Van der Steen et  al. (2017) found students high in working memory capacity produced more complex essays than low-capacity individuals. Another approach is to study brain-damaged individuals with impaired central executive functioning suffering from dysexecutive syndrome (see Glossary; and Chapter 6). Many patients with dysexecutive syndrome have difficulties planning and organising their ideas on writing tasks and in maintaining attention (Ardila & Surloff, 2006). Ardila and Surloff coined the term dysexecutive agraphia to refer to such patients. Sitek et  al. (2014) studied four patients with dysexecutive agraphia who had dementia causing severe cognitive impairment. Progressive deterioration in their writing skills was closely linked to more general cognitive impairment. What role does the phonological loop play in writing? We can address this question using an articulatory suppression task (e.g., saying the the the repeatedly) while individuals are engaged in a writing task. Articulatory suppression tasks use the resources of the phonological loop and so impair performance on other concurrent tasks requiring the phonological loop. Articulation suppression causes writers to produce shorter sequences of words (suggesting it suppresses their “inner voice”; Chenoweth & Hayes, 2003); it also slows down transcribing or copying texts (Hayes & Chenoweth, 2006). Finally, Colombo et  al. (2009) found articulatory suppression impaired writers’ ability to produce the component parts of ­multi-syllable words in the correct serial order. The above findings strongly suggest the phonological loop is often  used  during writing. However, we must not exaggerate its importance. Some patients with a severely damaged phonological loop nevertheless have essentially intact written language (Gathercole & Baddeley, 1993). What role does the visuo-spatial sketchpad play in writing? Relevant research was reviewed by Olive and Passerault (2012). Bourke et al. (2014) found individual differences in visuo-spatial working memory predicted children’s spelling and writing ability. Kellogg et  al. (2007) asked students to write descriptions of concrete (e.g., house) and abstract (e.g., freedom) nouns while detecting visual stimuli. The writing task slowed detection times only when concrete words were being described, indicating the visuo-spatial sketchpad is more involved when writers think about concrete objects. Somewhat separate visual and spatial processes occur within the visuo-spatial sketchpad (see Chapter 6). Are both processes involved in writing? Olive et al. (2008) asked students to write a text while performing a visual or spatial task and discovered the answer is “Yes”.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 557

557

Language production

KEY TERM Dysexecutive agraphia Severely impaired writing abilities in individuals with damage to the frontal lobes whose central executive functioning is generally impaired.

28/02/20 4:12 PM

558 Language

Evaluation The main writing processes are very demanding and effortful and impose substantial demands on working memory (see Olive, 2012, for a review). Individuals with high working memory capacity have good general and specific writing skills. The central executive is heavily involved in most writing processes. There is also convincing evidence the visuo-spatial sketchpad and phonological loop are both involved in the writing process. The visuo-­ spatial sketchpad is involved in planning and editing, and the phonological loop seems to be of relevance to various writing processes. What are the limitations of theory and research on working memory in writing? First, we do not know precisely why planning, sentence generation and revising are so demanding of processing resources. Second, there is little understanding of the role of working memory in influencing when and why writers shift from one writing process to another. However, switching of processes may be less likely when total working memory demands are high (Quinlan et al., 2012). Third, working memory (and other) processes are used flexibly and in parallel (i.e., at the same time) in writing (Olive, 2014). However, the factors determining when and how these processes are used in parallel (and how they interact with each other) remain unclear. Fourth, most research has been based on Baddeley’s working memory model with other approaches (e.g., theories based on working memory capacity; see Chapter 6) receiving little attention. These other approaches could potentially enhance our understanding of the processes underlying writing performance.

Word processing Goldberg et  al. (2003) carried out meta-analyses to compare writing performance when students used word processors or wrote in longhand. They concluded as follows: “Students who use computers when learning to write are not only more engaged in their writing but they produce work that is of greater length and higher quality” (Goldberg et al., 2003, p. 1). Van der Steen et  al. (2017) found students wrote faster and produced essays of higher quality when using word processing rather than writing by hand. Why was this? Students spent more time pausing in the word processing condition, and pausing was associated with more revision and thinking. Are there any disadvantages associated with word processing? Kellogg and Mueller (1993) found word processing involves more effortful planning and revision (but not sentence generation) than writing in longhand. Those using word processors were much less likely to make notes (12% vs 69%, respectively), which may explain the findings.

SPELLING Spelling is an important aspect of writing. Brain areas involved in spelling were identified by Planton et al. (2013; see Figure 11.12) in a meta-analytic review. Three main areas were consistently activated during handwriting tasks:

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 558

28/02/20 4:12 PM



559

Language production

Figure 11.12 Brain areas activated during handwriting tasks, controlling for verbal or linguistic input (red) or motor output (green). The areas in yellow are controlled for both and so provide an indication of handwriting-specific brain regions. IPS, intraparietal sulcus; SPL, superior parietal lobe; SFS, superior frontal sulcus; post CB, posterior cerebellum. From Planton et al. (2013). Reprinted with permission from Elsevier.

(1) Intraparietal sulcus and superior parietal lobule in the left hemisphere: this area is involved in the selection and/or representation of letter shapes. (2) Superior frontal sulcus in the left hemisphere: this area seems to be the interface between abstract letter combinations and the generation of motor commands. (3) Posterior cerebellum in the right hemisphere: this area is probably most involved in motor activity.

Case study: Differences in spelling ability

Planton et  al. (2017) discovered most of these areas were also activated when participants drew shapes or spelled out object names orally. Thus, they are not specialised for writing.

Dual-route theory Several theorists (e.g., Hepner et  al., 2017) have proposed versions of the dual-route theory for understanding the processes involved in spelling (see Figure 11.13(b)): ●●

●●

●●

The most important assumption is that there are two main routes between hearing a word and spelling it: (1) the lexical route (left-hand side of the figure); (2) the non-lexical route (right-hand side of the figure). The lexical route involves accessing word sounds in phonological long-term memory followed by accessing word meanings in the lexical semantic system and word spellings in orthographic long-term memory. It is the main route we use with familiar words regardless of whether the relationship between sounds (phonemes) and spellings (orthography) is regular (e.g., cat) or irregular (e.g., yacht). The non-lexical route does not involve gaining access to detailed information about the sound, meaning and spelling of heard words. Instead, it uses rules to convert sounds or phonemes into groups of

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 559

28/02/20 4:12 PM

560 Language Figure 11.13 The cognitive architectures for (a) reading and (b) spelling.

(a)

(b)

Letter identification processes Orthographic working memory

Lexical route

Spelling-to-sound conversion system

Phonological longterm memory

Phonological working memory

Phonological longterm memory Lexical semantic system Orthographic longterm memory

Orthographic working memory

Graphic motor plan selection

KEY TERMS Orthographic working memory (also known as the graphemic buffer) A store in which information about the individual letters in a word (and their ordering) is held immediately prior to spelling the word. Graphemic buffer (also known as the Orthographic working memory) A store in which graphemic information about the individual letters in a word is held immediately prior to spelling the word. Phonological dysgraphia A condition caused by brain damage in which familiar words can be spelled reasonably well but unfamiliar words and non-words cannot.

●●

●●

Sound-to-spelling conversion system

Sublexical route

Lexical semantic system

Sublexical route

Orthographic longterm memory

Phonological working memory

Lexical route

From Hepner et al., 2017).

Auditory/speech processing

Letter name selection

letters or words. This route is used when spelling unfamiliar words or non-words. It produces correct spellings when the relationship between sounds and spellings is regular or common but spelling errors when the relationship is irregular or uncommon. Both routes converge on orthographic working memory (also known as the graphemic buffer). This briefly holds information about the letters within a word and the ordering of those letters before they are written or typed. The processes and structures involved in spelling (Figure 11.13(b)) are very similar to those involved in reading (Figure 11.13(a)), but are used in the opposite direction. Spelling goes from hearing a word to writing it whereas reading goes from the written word to saying it.

Findings What would happen if brain-damaged patients could not use the non-lexical route but the lexical route was essentially intact? They would spell familiar words accurately (whether they were regular or irregular) because the spellings would be available in orthographic long-term memory. However, they would have great problems with unfamiliar words and non-words having no relevant information stored in long-term memory. Such patients have phonological dysgraphia. Shelton and Weinrich (1997) studied a male patient (EA) with phonological dysgraphia. He spelled 50% of regular words and 45% of irregular words correctly to dictation but 0% of non-words. Sotiropoulos and Hanley (2017) found phonological dysgraphics had normal performance when spelling regular and irregular words. It seems likely phonological dysgraphics have severe problems with phonological processing (processing involving word sounds). Accordingly,

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 560

28/02/20 4:12 PM



561

Language production

they should perform poorly on any task requiring phonological processing. As predicted, Cholewa et  al. (2010) found children with phonological dysgraphia performed poorly on various tests of phonological processing (e.g., deciding whether two spoken non-words sounded the same) as well as spelling non-words. What would happen if patients had damage to the lexical route and so relied mostly on the non-lexical route (converting sounds into groups of letters)? Such patients would be more accurate at spelling regular or consistent words and non-words (because the spelling can be worked out from the sound) than irregular or inconsistent words. Such patients suffer from surface dysgraphia. Macoir and Bernier (2002) studied a patient, MK, who spelled 92% of regular words correctly but only 52% of irregular words. Cholewa et al. (2010) in a study discussed earlier (pp. 560–561) found children with surface dysgraphia spelled 56% of irregular words incorrectly but only 19% of nonwords. According to the dual-route theory, surface dysgraphics should not have severe problems with phonological processing. Cholewa et al. obtained partial support for this prediction – surface dysgraphics had impaired on some phonological tasks but on fewer tasks than phonological dysgraphics. Some research findings indicate the dual-route theory is oversimplified. First, Treiman and Kessler (2016) asked participants to spell monosyllabic non-words. Whether a final /f/ sound in these non-words was spelt f or ff did not depend solely on sound-to-spelling rules as predicted by the theory. Spellings were also influenced by context – the spelling ff was more likely to be used when the preceding vowel in the non-word was spelt with one letter rather than two. Second, Rapp et al. (2002) found greater interaction between the lexical and non-lexical routes during word spelling than assumed by the theory. They studied LAT, a patient with Alzheimer’s disease. He spelled bouquet as BOUKET and knowledge as KNOLIGE. These spellings suggest some use of the non-lexical route. In addition, however, he could only have known that bouquet ends in t and that knowledge starts with k by using information in orthographic long-term memory. Third, according to the theory, the spelling of non-words should involve only the non-lexical route. Suppose you heard the non-word /vi:m/ and wrote it down. Would you write VEAM or VEME? Most spellers write VEAM if dream has just been presented auditorily but VEME if preceded by theme (Martin & Barry, 2012). This shows an influence of the lexical route on non-word spellings.

KEY TERM Surface dysgraphia A condition caused by brain damage in which there is impaired spelling of irregular words but reasonably accurate spelling of regular words and non-words.

Evaluation Evidence from phonological dysgraphics and surface dysgraphics provides reasonable evidence spelling can involve a lexical or non-lexical route. There is also evidence for interactions between the two routes. What are the limitations of the two-route theory? (1) The theory assumes phonological dysgraphics have a specific problem turning sounds into groups of letters. In fact, they often have a more general problem with phonological processing.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 561

28/02/20 4:12 PM

562 Language

KEY TERMS Orthographic lexicon Part of long-term memory in which learned word spellings are stored. Dyslexia Impaired ability to read not attributable to low intelligence. Dysgraphia Impaired ability to write (including spelling).

(2) There are more interactions between the two spelling routes than assumed by the theory. This has been found with respect to the spelling of words and non-words. (3) The theory is oversimplified because we use more sources of information in spelling than identified theoretically. As Treiman (2017, p.  84) pointed out, “English spelling includes more regularities than is often  thought.” For example, double consonants are common before a final (e.g., haddock; paddock) but uncommon before a final (e.g., tannic vs panic, magic, tragic). Treiman discussed research showing we use such information to increase the accuracy of our word spellings.

One or two orthographic lexicons? Look back to Figure 11.13 and you will see many similarities between reading and spelling. Of most relevance here, knowledge of word spellings (orthography) is important in reading and writing. The simplest (and most plausible) assumption is that there is a single orthographic lexicon within orthographic long-term memory used for reading and spelling. Alternatively, an input orthographic lexicon is used in reading and a separ­ ate output orthographic lexicon is used in spelling.

Findings Relevant evidence has come from studies on brain-damaged patients. Patients with a reading impairment (dyslexia) generally also have impaired writing and spelling (dysgraphia). In many cases, such patients have problems with the same specific words in reading and writing. The above findings suggest there is a single orthographical lexicon. However, some brain-damaged patients have greater problems with reading than spelling, or vice versa. Such evidence suggests there are two  orthographic lexicons. However, those with greater reading problems  generally have damage to brain areas associated with visual perception (e.g.,  BA17/18), whereas those with greater spelling problems have damage to premotor areas (e.g., BA6) (Rapp & Lipka, 2011). These findings reflect the greater role of perception in reading and of motor processes in spelling and so they do not indicate whether there are two orthographic lexicons. Hepner et  al. (2017) studied PJT (an 8-year-old boy with dysgraphia) and obtained the following findings: “We found no evidence of any reading impairment . . . indicating a striking dissociation between impaired spelling and superior reading.” Hanley and Sotiropoulos (2018) obtained similar findings with NR. He had no problems in reading words with atypical sound–letter associations but performed very poorly when required to spell such words. The above findings may suggest there are two orthographic lexicons, with the one associated with spelling being severely impaired in PJT and NR. However, this interpretation is implausible. It is more likely PJT has one orthographic lexicon but finds it hard to access the information in it via phonological long-term memory (as is required during spelling).

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 562

28/02/20 5:19 PM



563

Language production

Precentral gyrus Superior temporal/ supramarginal gyri

Middle frontal gyrus

Figure 11.14 Brain areas in the left hemisphere associated with reading, letter perception and writing. From James (2017).

Fusiform gyrus

Inferior frontal gyrus

Left hemisphere

Reading system Letter perception system Writing system

Interactive feature: Primal Pictures’ 3D atlas of the brain

James (2017) reviewed findings consistent with the dual-route theory and the notion there is a single orthographic lexicon. In essence, she found that the brain areas associated with writing correspond closely to those involved in reading and in letter perception (see Figure 11.14). Purcell et  al. (2017) assessed activation in areas associated with orthographic processing (the ventral occipito-temporal cortex and inferior frontal gyrus) while participants read visually presented words or spelled spoken words. These areas were involved in both reading and spelling.  When the same word was read on one trial and then spelled on the next trial (or vice versa), there was reduced activation in brain areas involved in orthographic processing These findings provide strong evidence there is a single orthographic lexicon used on reading and spelling tasks.

Evaluation The issue of one vs two orthographic lexicons has not been fully resolved. However, most evidence from brain-damaged patients and from neuro­ imaging studies on healthy individuals favours the notion of a single orthographic lexicon. What are the limitations of research in this area? First, much evidence is inconclusive. For example, individuals such as PJT and NR have severely impaired spelling ability. This may occur because they have a deficient orthographic lexicon for spelling or because they have problems accessing the orthographic lexicon when presented with word sounds. Second, the fact that children are often taught reading and spelling as separate skills suggests some caution before rejecting the notion of two orthographic lexicons.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 563

28/02/20 4:12 PM

564 Language

CHAPTER SUMMARY •

Introduction. Speaking and writing rely on the same knowledge base and involve similar planning skills. However, spoken language is simpler and less formal than written language because speakers have less time for planning than writers, and because speech fulfils a social function.



Basic aspects of speech production. Speech production involves several brain areas (and cognitive processes) overlapping with those involved in speech perception. The finding that individuals with mild cognitive impairment have poor speech quality indicates that speech production is demanding. It is demanding in part because limited short-term memory means speakers have to make rapid decisions while planning and producing utterances. Speech production involves four stages: semantic; syntactic; morphological; and phonological.



Speech planning. Speech planning can extend over a phrase or over a clause. The extent of advance speech planning often differs at the semantic, syntactic and phonological levels. Forward planning is generally more extensive when speakers have no time pressure, when they speak slowly and when they are under low cognitive load. Overall, speakers can choose flexibly whether to focus on effective and error-free communication or on minimising cognitive demands.



Speech errors. The study of speech errors can provide insights into the processes (e.g., planning) underlying speech production. Speech errors including spoonerisms, Freudian slips, semantic substitutions, exchange errors and subject-verb agreement errors. Perceptual loop theory argues that speakers use the comprehension system to monitor their inner and overt speech for errors. In contrast, conflict-based monitoring theory argues that error detection depends primarily on the speechproduction system combined with cognitive control processes. Speakers monitor their inner and overt speech. Most such monitoring probably involves the speech-production system and cognitive control processes rather than the comprehension system.



Theories of speech production. According to Dell’s spreadingactivation theory, the processing associated with speech production is parallel and interactive. The theory accounts for most speech errors but may exaggerate processing interactivity. WEAVER++ is a discrete, feedforward model based on the assumption of serial processing. Patterns of brain activation provide some support for this model, as does some research on the tip-of-the-tongue state. However, speech production often involves more interactive and parallel processing than assumed

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 564

28/02/20 4:12 PM



Language production

565

within WEAVER++. In addition, the model exaggerates the role of comprehension processes in the detection of one’s own speech errors. Various theorists (e.g., Christiansen & Chater, 2016) argue (with supporting evidence) that general cognitive processes (e.g., short-term memory; cognitive control) play a major role in speech production. These general processes are de-emphasised within spreading-activation theory and WEAVER++. •

Cognitive neuropsychology: speech production. There is a traditional distinction between Broca’s aphasia (slow, ungrammatical and non-fluent speech) and Wernicke’s aphasia (fluent speech often lacking meaning) involving damage to different brain areas. The anatomical definitions of Broca’s and Wernicke’s areas are unclear, and numerous other brain areas are also crucial in language processing. Anomia (impaired naming ability) can involve semantic impairments or phonological impairments, but interactions are sometimes found between semantic and phonological processing. Patients with agrammatism produce sentences lacking grammatical structure and with few function words. They have impaired processing resources, and general problems with sequence learning. The speech of jargon aphasics is reasonably grammatical. However, they produce many neologisms mostly due to deficient phonological processing. Jargon aphasics’ production of jargon occurs in part because they have deficient self-monitoring of their own speech. We can avoid an overreliance on categories such as anomia, agrammatism and jargon aphasia by focusing on empirical similarities and differences in patterns of language impairments among aphasic patients.



Speech as communication. The key purpose of speech is communication, and speakers are often sensitive to the needs of their listener. They also often make use of the common ground shared with the listener, but its use is subject to speakers’ memory limitations as well as their processing limitations. Speakers use gestures in flexible ways that are generally responsive to the listener. However, speakers make gestures even when the listener cannot see those gestures, because the use of gestures facilitates speakers when planning what to say. Other ways speakers facilitate communication are by using prosodic cues (e.g., rhythm; stress) and discourse markers (words or phrases indirectly assisting the listener’s comprehension).



Writing: the main processes. Writing involves proposing or planning, translating, transcribing, and evaluating and revising the text that has been produced. Shifts from one writing process to another depend on a monitor or control system. Good writers use a knowledge-transforming rather than knowledge-telling strategy and devote more time to revision. Expert writers attain

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 565

28/02/20 4:12 PM

566 Language

a knowledge-crafting stage emphasising the reader’s needs. The working memory system (especially the central executive) is heavily involved in the writing process. Word processing often enhances writing quality, probably by encouraging revision and thinking processes. •

Spelling. According to the two-route theory, there are separate lexical and non-lexical routes in spelling, with the former used to spell familiar words and the latter unfamiliar words and nonwords. Phonological dysgraphics have damage to the lexical route whereas surface dysgraphics have damage to the non-lexical route. The theory is oversimplified: phonological dysgraphics often have general problems with phonological processing and the two routes often interact. Reading and spelling probably both involve a single orthographic lexicon. However, the evidence is complex and hard to interpret.

FURTHER READING Chater, N., McCauley, S.M. & Christiansen, M.H. (2016). Language as skill: Intertwining comprehension and production. Journal of Memory and Language, 89, 244–254. Nick Chater and colleagues stress the strong links between speech production and comprehension and emphasise the role played by general processes (e.g., short-term memory; basic learning) in their development. Dronkers, N.F., Ivanova, M.V. & Baldo, J.V. (2017). What do language disorders reveal about brain-language relationships? From classic models to network approaches. Journal of the International Neuropsychological Society, 23, 741–754. This article by Nina Dronkers and colleagues contains a historical account showing how cognitive neuropsychology has enhanced our understanding of the processes underlying language comprehension and production. Ferreira, V.S. (2019). A mechanistic framework for explaining audience design in language production. Annual Review of Psychology, 70, 29–51. Victor Ferreira discusses the processes used by speakers when attempting to be responsive to listeners’ needs. Hepner, C., McCloskey, M. & Rapp, B. (2017). Do reading and spelling share orthographic representations? Evidence from developmental dysgraphia. Cognitive Neuropsychology, 34, 119–143. Christopher Hepner and colleagues discuss a recent version of the influential two-route theory of spelling. Kellogg, R.T., Turner, C.E., Whiteford, A.P. & Mertens, A. (2016). The role of working memory in planning and generating written sentences. Journal of Writing Research, 7, 397–416. Ronald Kellogg and his colleagues provide a comprehensive account of the various ways working memory contributes to the writing process. McClain, R. & Goldrick, M. (2018). The neurocognitive mechanisms of speech production. In S.L. Thompson-Schill (ed.), Stevens’ Handbook of Experimental Psychology and Cognitive Neuroscience, Vol. 3: Language and Thought (4th edn; pp. 319–356). New York: Wiley. The major processes (including attention) that are involved in speech production are discussed in this chapter.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 566

28/02/20 4:12 PM



Language production

567

Nozari, N. & Novick, J. (2017). Monitoring and control in language production. Current Directions in Psychological Science, 26, 403–410. This article provides an overview of the mechanisms used by speakers to avoid or correct production errors.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_3.indd 567

28/02/20 4:12 PM

9781138482210_COGNITIVE_PSYCHOLOGY_PART_4.indd 568

28/02/20 4:14 PM



Our ability to reflect in complex ways on our lives (e.g., to plan and solve our daily problems) is the bedrock of thinking behaviour. The ways we think (and reason and make decisions) are very varied. They range from solving newspaper crossword puzzles, to troubleshooting (or not!) if our car breaks down, to developing a new theory of the universe. Below we consider two examples of the activities to which we apply the term “thinking”. First, a fragment of Molly Bloom’s sleeping thoughts in James Joyce’s Ulysses (1922/1960, pp. 871–872): God help the world if all women in the world were her sort down on bathingsuits and lownecks of course nobody wanted her to wear I ­ suppose she was pious because no man would look at her twice I hope I’ll never be like her a wonder she didn’t want us to cover our faces but she was a well educated woman certainly and her gabby talk about Mr Riordan here and Mr Riordan there I suppose he was glad to get shut of her. Second, here is the first author struggling to use PowerPoint: Why has the Artwork put the title in the wrong part of the slide? Suppose I try to put a frame around it so I can drag it up to where I want it? Ah ha, now if I just summon up the arrows I can move the top bit up, and then I do the same with the bottom bit. If I move the bottom bit up more than the top bit, then the title will fit in okay.

PART IV

Thinking and reasoning

569

These two examples illustrate several general aspects of thinking. First, they both involve individuals being conscious of their thoughts. Thinking typically involves conscious awareness. There is an ongoing controversy concerning the extent to which higher-level cognitive processes such as thinking can be unconscious (see Chapter 16). Hassin (2013) claimed unconscious processes can perform all the functions of conscious processes. However, the evidence suggests unconscious processes are much more

9781138482210_COGNITIVE_PSYCHOLOGY_PART_4.indd 569

28/02/20 4:14 PM

570 570

Thinking and reasoning

limited than conscious ones (especially with respect to thinking and reasoning) when stringent criteria are used to identify processes as “unconscious” (Hesselmann & Moors, 2015). Note also that we tend to be aware of the products of thinking rather than processes themselves (see Chapter 16). Furthermore, even when we can introspect on our thoughts, our recollections of them are often inaccurate. Joyce reconstructs well the nature of idle, associative thought in Molly Bloom’s internal monologue. However, if we asked her to tell us her thoughts from the previous five minutes, she would probably recall very little of it. Second, thinking varies in the extent to which it is directed and controlled. It can be relatively undirected as in the case of Molly Bloom letting one thought slide into another as she is on the point of slipping into a dream. In the other example, the goal is much clearer and better defined. Third, the amount and nature of the knowledge used in different thinking tasks vary enormously. The knowledge required in the PowerPoint example is relatively limited (even though it took the first author much time to acquire it!). In contrast, Molly Bloom is making use of her vast knowledge of people and of life. The next three chapters (Chapters 12–14) are concerned with the higher-­ level cognitive processes involved in thinking and reasoning (see the Box, Forms of thinking, on the facing page). Of importance, we use the same cognitive system to deal with all these types of thinking and reasoning. As a result, many distinctions between different forms of thinking and reasoning are somewhat arbitrary and camouflage similarities in underlying cognitive processes. From the above viewpoint, it is unsurprising that similar brain areas are typically involved in most problem-solving and reasoning tasks (see Chapter  14). It is also worth mentioning there has recently been a major shift in research from deductive reasoning to informal reasoning because the latter is of considerably more relevance in everyday life. Informal reasoning is closer than deductive reasoning to research on judgement and decision-making because it makes much more use of an individual’s know­ ledge and experience. We will briefly describe the structure of this section. Chapter 12 is concerned primarily with the processes involved in problem solving. We discuss various types of problems (e.g., those involving insight) and there is an emphasis on the reasons why most people find it very difficult to solve certain problems. There is also an emphasis on the factors involved in the development of expertise in various areas (e.g., chess playing; medical expertise). Chapter 13 deals with judgement and decision-making with an emphasis on the errors and biases that are often involved. A central theme is that most people make extensive use of heuristics (rules of thumb) that are simple to

9781138482210_COGNITIVE_PSYCHOLOGY_PART_4.indd 570

28/02/20 4:14 PM



Thinking and reasoning

571

use but prone to error. Complex decision-making is also considered, as well as the role of emotional factors in decision-making. Chapter 14 deals with the major forms of reasoning (inductive, deductive and informal) and the errors to which they are prone. There is also discussion of the key (but very tricky!) question, “Are humans rational?”. As you might expect, many psychologists answer that question “Yes and no”, rather than a definite “Yes” or “No”!

FORMS OF THINKING Problem solving Cognitive activity that involves moving from the recognition that there is a problem through a series of steps to the solution. Most other forms of thinking involve some problem solving. Problem solving differs from decision-making in that individuals have to generate their own solutions. Decision-making Selecting one out of a number of presented  options or possibilities, with the decision having personal consequences (e.g., winning or losing money). Judgement A component of decision-making that involves  calculating the likelihood of various possible events; the emphasis is on accuracy. Deductive reasoning Deciding what conclusions necessarily follow provided various statements are assumed to be true; most deductive-reasoning tasks are based on formal logic; however, most individuals use informal reasoning (see below) rather than logic with such tasks (see Evans et al., 2015). Informal reasoning Evaluating the strength of arguments by taking account of one’s relevant knowledge and experience. Inductive reasoning Deciding whether certain statements or hypotheses are true on the basis of the available information. It is used by scientists and detectives but is not guaranteed to produce valid conclusions.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_4.indd 571

28/02/20 4:14 PM

Taylor& Francis Taylor & Francis Group http://taylorandfrancis.com

Chapter

Problem solving and expertise

12

INTRODUCTION Life presents us with many problems, although thankfully most are fairly trivial. Here are three examples. First, you have an urgent meeting in another city. However, the trains generally run late, your car is old and unreliable, and the buses are slow. Second, you are struggling to work out the correct sequence of operations on your computer to perform a given task. You try to remember what you needed to do with your previous computer. Third, you are an expert chess player competing against a strong opponent. The time clock is ticking, and you must rapidly decide on your move in a complicated position. The above examples relate to the three main topics of this chapter. The first is problem solving, which involves the following (Goel, 2010, p.  613): “(1) there are two states of affairs; (2) the agent [problem solver] is in one state and wants to be in the other state; (3) it is not apparent to the agent how the gap between the two states is to be bridged; and (4) bridging the gap is a consciously guided multi-step process.” One reason problem solving is very important is because it is “a crossroads, where many different processes come together in the service of the needs and goals of an individual” (Weisberg, 2018, p. 607). The second topic is analogical problem solving. In our everyday lives, we constantly use past experience and knowledge to assist us in our current task. Often we detect (and make effective use of) analogies or similarities between a current problem and ones solved in the past. The third topic is expertise. Individuals possessing expertise have considerable specialist knowledge in one area or domain. There is much overlap between expertise and problem solving in that experts are very efficient at solving numerous problems in their area of expertise. However, there are also important differences. Knowledge is typically more important in research on expertise than problem solving. In addition, there is more focus on individual differences in expertise research. Indeed, a central issue in expertise is to identify the main differences (e.g., in knowledge; in strategic processing) between experts and novices.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_4.indd 573

28/02/20 4:14 PM

574

KEY TERMS Well-defined problems Problems in which the initial state, the goal and the methods available for solving them are clearly laid out. Ill-defined problems Problems in which the problem is imprecisely specified; for example, the initial state, the goal state and the methods available to solve the problem may be unclear. Knowledge-rich problems Problems that can only be solved by those having considerable relevant background knowledge. Knowledge-lean problems Problems that can be solved by individuals in the absence of specific relevant prior knowledge.

Thinking and reasoning

PROBLEM SOLVING: INTRODUCTION There are three major aspects to problem solving: (1) It is purposeful (i.e., goal-directed). (2) It involves controlled processes and is not totally reliant on “automatic” processes. (3) A problem exists when someone lacks the relevant knowledge to produce an immediate solution. Thus, for example, a task involving mathematical calculation may be a problem for most individuals but not a professional mathematician. The above three aspects are typically found during problem solving. However, as we will see, problem solving sometimes depends on non-­ conscious processes as well as (or instead of) the conscious deliberate processes implied by aspects (1) and (2). There are major differences among problems. Well-defined problems are ones where all problem aspects are clearly specified, including the initial state or situation, the range of possible moves or strategies, and the goal or solution. The goal is well specified because it is clear when it has been reached (e.g., the centre of a maze). Chess is a well-defined problem: there is a standard initial state, the rules specify all legitimate moves and the goal is to achieve checkmate. However, chess is in some ways ill-­defined – the nature of the problem faced by a chess player varies constantly during a game. Ill-defined problems are underspecified. Suppose you set yourself the goal of becoming happier. There are endless strategies you could adopt, and it is very hard to anticipate which would be most effective. Since happiness varies over time and is hard to define, how are you going to decide whether you have solved the problem of becoming happier? Most everyday problems are ill-defined. However, psychologists have focused mostly on well-defined problems. Why is this? With well-defined problems, the researcher knows the correct answer and often also knows the optimal strategy for their solution. As a result, they can easily identify the errors and deficiencies in problem solvers’ strategies. Goel and Grafman (2000) studied PF, a man with brain damage to the right prefrontal cortex. He had a high IQ (128) and performed successfully on well-defined laboratory tasks. However, he performed very poorly with everyday ill-defined problems because he produced inadequate preliminary plans. In similar fashion, Goel et al. (2013) found patients with damage to the right prefrontal cortex made premature commitments when planning a trip to Italy. In contrast, planning is more straightforward with most well-defined problems. We can also distinguish between knowledge-rich and knowledge-lean problems. Knowledge-rich problems (e.g., chess problems) can only be solved by those having much relevant specific knowledge. In contrast, knowledge-lean problems do not require such knowledge because the information needed to solve the problem is contained in the initial problem statement. Historically, most research involved knowledge-lean problems because they minimise individual differences in relevant knowledge.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_4.indd 574

28/02/20 4:14 PM



575

Problem solving and expertise

IN THE REAL WORLD: MONTY HALL PROBLEM We can illustrate key issues in problem solving by considering the notorious Monty Hall problem that formed an important part of Monty Hall’s show on American television: Suppose you’re on a game show and you’re given the choice of three doors. Behind one door is a car, behind the others, goats. You pick a door, say, Number 1, and the host, who knows what’s behind the doors, opens another door, say Number 3, which has a goat. He then says to you, “Do you want to switch to door Number 2?” Is it to your advantage to switch your choice? If you stayed with your first choice, you are in good company since approximately 85% of people make that decision. Unfortunately, it is wrong! There is actually a two-thirds chance of being correct if you switch. Monty Hall, the game-show host. Most people (including you?) furiously dispute the above answer. Monty Hall. ZUMA Press, Inc./Alamy. Let’s work it through. There are only three possible scenarios with the Monty Hall problem (Krauss & Wang, 2003; see Figure 12.1). With scenarios 1 and 2, your first choice is incorrect, and so Monty Hall opens the only remaining door with a goat behind it. As a result, switching is certain to succeed. With scenario 3, your first choice is correct, and you would win by refusing to switch. Overall, switching succeeds two-thirds of the time.

Figure 12.1 Explanation of the solution to the Monty Hall problem: in two out of three possible car/ goat arrangements, the contestant would win by switching; therefore she should switch. From Krauss and Wang (2003). © 2003 American Psychological Association.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_4.indd 575

28/02/20 4:14 PM

576

Thinking and reasoning

Human performance on the Monty Hall problem is very poor. Indeed, Herbranson and Schroeder (2010) found it was much worse than that of pigeons! After extensive practice, humans switched on 66% of trials (the optimal response) whereas pigeons switched on 96% of trials. The pigeons performed well because they simply maximised the reward they received whereas humans used more complex strategies. Why do humans perform so poorly on this problem? First, people typically use a heuristic (rule of thumb) known as the equiprobability bias (assuming all available options are equally likely even when they are not; Tubau et al., 2015). In addition, people experience more regret when losing by switching than when losing by staying (Tubau et al.). These two factors lead most people to stay, mistakenly. Second, the problem places substantial demands on the central executive (an attention-like system; see Glossary). Performance on the Monty Hall problem was much worse if participants performed a demanding task involving the central executive at the same time (8% vs 22%; De  Neys & Verschueren, 2006). Third, many people mistakenly believe the host’s actions are random. Burns and Wieth (2004) made the causal structure of the problem clearer. There are three boxers, one of whom is so good he is certain to win any bout. You select one boxer and then the other two fight each other. The winner of this bout then fights the boxer you selected initially. You win if you choose the winner of this second bout. With this version of the problem, 51% correctly decided to switch versus only 15% with the standard three-door problem. This occurred because it is easy to see that the boxer who won the first bout did so because of skill rather than any random factor. Fourth, it is very hard to understand the problem. Saenen et al. (2015) found 16% of university students produced the optimal answer (i.e., switching) but only half understood the underlying probabilities. For example, the probabilities of winning-when-staying and winning-when-switching must equal 1 but several participants produced probabilities that did not sum to 1! Thus, it is possible to “solve” the Monty Hall problem without full understanding. When participants are provided with relevant information about the underlying probabilities, over 80% of them decided to switch compared to only 40% when that information was not provided (James et al., 2018).

Researchers have studied problem solving using literally thousands of different problems. This raises the issue as to whether there is some commonality in the processes used to solve these diverse problems. Bartley et al. (2018) addressed this issue in a meta-analysis (see Glossary) of neuroimaging studies involving mathematical, verbal and visuo-spatial problems. Bartley et al. (2018) identified what they called “a core problem solving network” (p. 318) that was common to all three types of problem (see Figure 12.2(d)). More specifically, there was a fronto-parietal network (e.g., the dorsolateral prefrontal cortex; the cingulate gyrus involved in processes such as attention, monitoring and working memory). In addition, there were brain areas specific to mathematical, verbal and visuo-spatial problems (see Figure 12.2).

KEY TERM Heuristic Rule of thumb that is cognitively undemanding and often produces approximately accurate answers; see algorithm.

GESTALT APPROACH AND BEYOND: INSIGHT AND ROLE OF EXPERIENCE Early research on problem solving was dominated by the gestaltists, German psychologists flourishing between the 1920s and 1940s. They distinguished between reproductive and productive thinking. Reproductive thinking involves the systematic re-use of previous experiences (e.g., in

9781138482210_COGNITIVE_PSYCHOLOGY_PART_4.indd 576

28/02/20 4:14 PM



577

Problem solving and expertise

KEY TERM Insight The experience of suddenly realising how to solve a problem; sometimes referred to as the “the Aha! experience”.

Figure 12.2 Brain areas (a) involved in mathematical problem solving; (b) verbal problem solving; (c) visuo-spatial problem solving; and (d) areas common to all three problem types (conjunction). From Bartley et al. (2018). Reprinted with permission of Elsevier.

mathematical problems) and is mostly required on well-defined problems. Productive thinking involves novel problem restructuring and is mostly required on ill-defined problems. In what follows, our main focus will be on theorising and research influenced by the Gestalt approach with only occasional mentions of the gestaltists’ original research.

Insight The gestaltists argued problems requiring productive thinking are often solved using insight. Insight involves a sudden problem restructuring, often accompanied by an “Aha! experience”. More technically, insight is “any sudden comprehension, realisation, or problem solution that involves a reorganisation of the elements of a person’s mental representation of a stimulus, situation, or event to yield a non-obvious or non-dominant ­interpretation” (Kounios & Beeman, 2014, p. 74). The mutilated draughtboard (or chequerboard) problem (see Figure 12.3) is an insight problem. The board is initially covered by 32 dominoes occupying two squares each. Then two squares are removed from diagonally

9781138482210_COGNITIVE_PSYCHOLOGY_PART_4.indd 577

Figure 12.3 The mutilated draughtboard problem.

28/02/20 4:14 PM

578

KEY TERM Remote Associates Test This involves finding a word that is related to three given words (e.g., opera, hand and dish are all related to soap).

Thinking and reasoning

opposite corners. Can the remaining 62 squares be filled by 31 dominoes? What is your answer? Nearly everyone starts by mentally covering squares with dominoes. Alas, this strategy is ineffective because there are 758,148 possible permutations! You may well rapidly solve the problem using insight if we tell you something you already know – each domino covers one white and one black square. If that does not work, note that the two removed squares must have the same colour. Thus, the 31 dominoes cannot cover the mutilated board. There is theoretical controversy concerning insight. Some (including the gestaltists) claim it is very different from other cognitive processes (the special-process viewpoint). However, others claim very similar processes are used in insight and non-insight problems (the business-as-usual viewpoint) (Zander et al., 2016). Below we discuss this controversy.

Findings

Case study: Brain areas involved in insight

Researchers often use participants’ reports of the Aha! experience to indicate insight. Ideally, the Aha! experience should be reported predominantly on “insight problems” rather than “non-insight problems” and should be associated with correct solutions. The evidence partially supports these predictions. Webb et al. (2016a) found Aha! experiences were reported more often with insight than non-­ insight or problems. However, insight problems were sometimes solved without any Aha! experiences and the solution of non-insight problems was sometimes accompanied by Aha! experiences. The gestaltists apparently assumed insight always produces correct solutions. Danek and Wiley (2017) reported contrary evidence using insight problems. Many incorrect solutions (especially those produced rapidly) were associated with Aha! experiences. Much research has considered whether insight is associated with a specific pattern of brain activity (Kounios & Beeman, 2014). The findings are variable. Bowden et al. (2005) used the Remote Associates Test: three words were presented (e.g., fence; card; master) and participants thought of a word (e.g., post) going with each one to form compound words. The anterior superior temporal gyrus was activated only when solutions involved insight. This is a brain area associated with processing distant semantic relations between words as well as reinterpretation and semantic integration. Other areas associated with insight are the anterior cingulate cortex (involved in the detection of cognitive conflict and the breaking of a mindset) and the prefrontal cortex (involved in higher cognitive processes (Kounios & Beeman, 2014). Metcalfe and Wiebe (1987) assessed participants’ feelings of “warmth” (closeness to solution) during insight and non-insight problems. Warmth increased progressively during non-insight problems (as expected because they involve several processes). With insight problems, warmth ratings remained low until suddenly increasing dramatically just before problem solution (consistent with the Aha! experience). Kizilirmak et al. (2018) reported similar findings using the Remote Associates Test (discussed above). Feelings of warmth increased much more abruptly for problems

9781138482210_COGNITIVE_PSYCHOLOGY_PART_4.indd 578

28/02/20 4:14 PM



579

Problem solving and expertise

whose solution was accompanied by an Aha! experience than those solved with such experience. Subjectively, insight occurs suddenly and unexpectedly. However, this is not necessarily true of the underlying processes. Ellis et al. (2011) recorded eye movements while participants solved four-letter anagrams (five letters were presented but one was a distractor). On insight trials, participants reported suddenly finding the solution to the problem. However, participants had decreasing fixations on the distractor letter ahead of the solution indicating they were gradually, but unconsciously, accumulating relevant knowledge. In sum, insight is a process differing from other, more controlled ­processes. However, there are issues with respect to the measurement of insight. For example, Laukkonen and Tangen (2018) found problem solvers often report the Aha! experience in the absence of a sudden increase in warmth ratings and vice versa. The Aha! experience is a preferable measure of insight because it is more consistently associated with various objective measures (e.g., problem-solving strategies; performance accuracy on insight problems) (see Laukkonen & Tangen). Of relevance, the Aha! experience is associated with increased autonomic arousal (Shen et al., 2018) indicating an emotional reaction to insightful problem solving.

KEY TERM Impasse The experience of being blocked and not knowing how to proceed when engaged in problem solving.

Representational change theory Ohlsson (1992, 2011) developed the gestaltist approach in his representational change theory. According to this theory, the initial stage of problem solving involves forming a mental representation of the problem. After that, we access various mental operators that might be applied to this ­representation, only one of which is selected and used at any given time. More specifically, the current mental representation causes activation to spread to mental operators related to it in meaning via an unconscious process and the mental operator most strongly activated is retrieved. We often encounter an impasse (feeling blocked and unsure how to proceed) when solving a problem because our mental representation of it is incorrect. Theoretically, we must change (or restructure) the problem representation for insight to occur. This can happen in three ways: (1) Constraint relaxation: inhibitions on what is regarded as permissible are removed. (2) Re-encoding: some aspect of the problem representation  is  re­ inter­ preted. (3) Elaboration: new problem information is added to the representation. Őllinger et al. (2014) developed representational change theory (see Figure 12.4). What is new is the assumption that a search process may be necessary even after an impasse has been overcome by insight. For example, consider the nine-dot problem which requires four straight lines that go through all nine dots (see Figure 12.5). Most people initially assume the line must remain within the confines of the square formed by the dots. Even when this constraint is relaxed by explicitly instructing participants that they can draw lines outside the square, performance is still poor.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_4.indd 579

28/02/20 4:14 PM

580

Thinking and reasoning

Figure 12.4 Flow chart of insight problem solving. Initially, a problem representation is established using prior knowledge and perceptual processes. The problem representation is searched by heuristics (rules of thumb). If this proves unsuccessful, an impasse is encountered. This leads to a change in the problem representation and this new representation is also searched by heuristics. This process is continued until a solution is found or the problem is abandoned.

Thus, the processes involved can be more complex than envisaged within representational change theory.

Findings Earlier we discussed the mutilated chessboard problem on which nearly everyone starts with an incorrect problem representation. Solving it requires representing each domino Figure 12.5 as an object covering one white and one black (a) The nine-dot problem and (b) its solution. square (­ re-encoding) and representing the chessboard as having lost two black (or white) squares (elaboration). Knoblich et al. (1999) showed the importance of constraint relaxation using matchstick problems involving Roman numerals (see Figure  12.6). The solution to each problem requiring moving a single stick to produce a true statement to replace the initial false one. Some problems (Type A) only required changing two values in the equation (e.g., VI = VII + I [6 = 7 + 1] becomes VII = VI + I [7 = 6 + 1]). In contrast, Type B ­problems involved a less obvious change in the ­representation of the equation (e.g., IV = III – I [4 = 3 – 1] becomes IV – III = I [4 – 3 = 1]). According to Knoblich et al. (1999), we have learned that many operations change the values (numbers) in an equation (as in Type A ­problems). In contrast, relatively few operations change the operators (i.e., +, – and =) as required in Type B problems. As predicted, participants found it much harder to relax the normal constraints of arithmetic (and so show insight) with Type B problems. Knoblich et al. (2001) reported further evidence that participants’ initial representation is based on the assumption that values must be changed. Participants initially spent much more time fixating the values than the operators with both types of problems.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_4.indd 580

28/02/20 4:14 PM



581

Problem solving and expertise

Figure 12.6 Two of the matchstick problems used by Knoblich et al. (1999) and the cumulative solution rates produced for these types of problems in their study. © American Psychological Association.

Reverberi et al. (2005) argued that processing constraints on insight problems involve the lateral prefrontal cortex. Patients with damage to that area should not impose artificial constraints when solving insight problems and so might perform better than healthy controls. As predicted, brain-damaged patients solved 82% of the hardest matchstick arithmetic problems compared to only 43% of controls. According to representational change theory, solution hints should be most useful when individuals have just reached an impasse or block. At that point, they have formed an incorrect problem representation but have not become excessively fixated on it. Moss et al. (2011) obtained findings consistent with this prediction. Fleck and Weisberg (2013) asked participants to think aloud while solving insight problems. There were large individual differences in their strategies. Evidence of impasse and restructuring (of crucial importance according to representational change theory) was obtained on only 25% of problem attempts. Other successful strategies included direct applications of knowledge with no representational change and use of simple heuristics or rules of thumb (e.g., hill climbing, see Glossary). Overall, there was much less evidence of impasse and restructuring when solutions were produced rapidly rather than slowly. Fedor et al. (2015) reported various findings inconsistent with representational change theory in a study on solving an insight problem. First, less than 50% of problem solvers followed the theoretically predicted sequence of constrained search, impasse, insight, extended search and solution. Most used more complex processing sequences with search and impasse occurring several times. Second, Fedor et al. compared reported experiences of impasse with behaviourally defined measures (i.e., repetitious behaviour; inactivity). Problem solvers were no more likely to report experiencing an impasse during a behaviourally defined impasse than at other stages of processing. In sum representational change theory provides a more explicit and testable account than the original Gestalt theory. However, it is i­ncreasingly clear that problem solving on insight problems is significantly more flexible and variable than assumed by that theory.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_4.indd 581

28/02/20 4:14 PM

582

Thinking and reasoning

IN THE REAL WORLD: MAGIC TRICKS Many magic tricks persuade spectators to focus on a strong (but incorrect) problem representation (Danek et al., 2014). For example, a magician pours water from a glass into an empty mug. He then turns the mug upside down and a large ice cube drops out (see YouTube: http://www.youtube.com/watch?v=3B6ZxNROuNw). Figure 12.7 This trick works because most people assume the mug The multiplying billiard balls trick. (a) This is the is empty. In fact, there is a white napkin glued to the end of the trick when the initial one ball has bottom of the mug and the ice cube. The water is fully become four balls; (b) the secret of this trick is absorbed by the napkin and so only the ice cube falls that the initial ball is an empty semi-spherical out. Participants given a verbal cue to relax the incor- shell that can contain another ball. rect assumption that the mug was empty had improved From Ekroll et al. (2017). performance. Spectators often cannot change their initial incorrect problem representation into the correct one. Our perceptual system rapidly and unconsciously extrapolates from the visible parts of objects to complete them (the Gestalt law of closure shown in Figure 3.4) (Ekroll et al., 2017). For example, consider the Chinese linking rings trick in which solid metal rings appear to link and unlink by passing through each other. One ring has a small gap in it, but spectators assume all the rings are complete. The multiplying billiard balls trick also depends on visual completion (see Figure 12.7). The conjuror starts with a single ball and then progressively adds balls until they have four: the initial ball is a hollow shell initially having a complete ball hidden in it. When that second ball is revealed, the conjuror inserts another complete ball in the hollow shell and so on. When observers viewed a hollow shell balanced on the tip of their finger, they perceived a complete ball despite strong evidence it was hollow (Ekroll et al., 2016). They even perceived their own finger as shorter than usual! These findings are directly relevant to representational change theory  – observers often cannot correct their incorrect problem representation because their assumption the initial ball is complete is based on powerful perceptual processes.

Evaluation Representational change theory extended the Gestalt approach by specifying the mechanisms underlying restructuring and insight. More generally, it involves a fruitful combination of Gestalt ideas with cognitive psychology. Öllinger et al.’s (2014) extension of this theory has improved it by emphasising that efficient search processes are often needed after as well as before an impasse leading to insight. What are the theory’s limitations? First, the theory provides an ideal­ ised account of the processes involved in insight problems. There are substantial individual differences in problem processing, and processing sequences are often more complex and flexible than assumed theoretically. Second, we often cannot predict when (or why) problem solvers change a problem’s representation. Third, there is often surprisingly little evidence of restructuring or impasse when individuals solve insight problems. Fourth, the strategies used to solve insight problems include some (e.g., direct application of knowledge; heuristics) not included within the theory. Fifth, the original

9781138482210_COGNITIVE_PSYCHOLOGY_PART_4.indd 582

28/02/20 4:14 PM



theory mistakenly implied that constraint relaxation is typically sufficient to solve insight problems (Őllinger et al., 2014).

Facilitating insight: hints and incubation We can facilitate insight by providing subtle hints. Consider Maier’s (1931) pendulum problem. Participants enter a room containing various objects (e.g., poles, pliers, extension cords) plus two hanging strings (see Figure  12.8). The task involves tying the strings together, but they are too far apart for participants to reach one string while holding the other. The solution involves tying the pliers to one string and swinging it like a pendulum. Thomas and Lleras (2009) used the pendulum problem with occasional exercise breaks in which participants swung or stretched their arms. Those moving their arms in a solution-relevant way (i.e., swinging) were more likely to solve the problem even though unaware of the relationship between their arm movements and the task. Wallas (1926) claimed problem solving can benefit from incubation, which “arises when the solution . . . comes to mind after a temporary shift of attention to another domain” (Sio & Ormerod, 2015, p. 113). Research typically involves comparing an experimental group having an incubation period away from an unsolved problem with a control group working continuously. Sio and Ormerod (2009) reported three findings in a meta-analysis: (1) Incubation effects (generally fairly small) were reported in 73% of the studies. (2) Incubation effects were stronger with creative problems having multiple solutions than linguistic and verbal problems having a single

9781138482210_COGNITIVE_PSYCHOLOGY_PART_4.indd 583

583

Problem solving and expertise

KEY TERM Incubation A stage of problem solving in which the problem is put to one side for some time; it is claimed to facilitate problem solving.

Figure 12.8 The two-string problem in which it is not possible to reach one string while holding the other.

28/02/20 4:14 PM

584

KEY TERM Mental set The tendency to use a familiar problem-solving strategy that has proved successful in the past even when it is no longer appropriate; also known as Einstellung.

Thinking and reasoning

solution. Incubation often widens the search for knowledge, which may be more useful with multiple-solution problems. (3) The effects were larger when there was a fairly long preparation time prior to incubation. This may have occurred because an impasse or block in thinking is more likely to develop when preparation time is long. Why is incubation beneficial? Simon (1966) argued control information relating to the strategies used by problem solvers is forgotten during incubation. This forgetting facilitates problem solvers adopting a new approach after the incubation period. Penaloza and Calvillo (2012) found solving insight problems was only facilitated by a 2-minute break when this allowed misleading information to be forgotten. Gilhooly (2018) focused on “unconscious work”: “Incubation effects involve active although unconscious processing of the problem materials.” His approach is supported by research on insight problems using two conditions: (1) the task instructions immediately precede each problem; (2) a totally irrelevant task is performed between instructions and the problem. Performance is typically better in condition (2) than condition (1). This can be explained by unconscious work but not forgetting previously used strategies or information.

Past experience: mental set Past experience generally increases our ability to solve problems. However, the gestaltists argued persuasively we sometimes fail to solve problems because we are misled by our past experience. For example, mental set (Einstellung in German) involves continuing to use a previously successful problem-solving strategy even when inappropriate or suboptimal. However, mental set is often useful – it allows successive problems of the same type to be solved rapidly, with few processing demands. Luchins (1942) investigated mental set using problems that involved three water jars of varying capacity. Here is a sample problem. Jar A can hold 28 quarts of water, Jar B 76 quarts and Jar C 3 quarts. You must end up with exactly 25 quarts in one of the jars. The solution is easy: Jar A is filled, and then Jar C is filled from it, leaving 25 quarts in Jar A. Of participants previously given similar problems, 95% solved it. Other participants had previously been trained on problems all having the same complex three-jar solution (fill Jar B and use the contents to fill Jar C twice and Jar A once). Of these participants, only 36% solved the easy final problem! Vallée-Tourangeau et al. (2011) found the damaging effects of mental set on Luchins’ water-jar problems were reduced when actual water jars were used rather than presenting the problems on paper (as in the original research). According to Vallee-Tourangeau et al., the actual water jars provided a “rich and dynamic . . . perceptual input” (p. 1894). Thomas and Didierjean (2016) showed some powerful effects of mental set. Participants saw a central card surrounded by six cards (all face down). A magician asked them to select one of the six cards, which was then revealed to match the central card. Participants were asked to identify the trick’s secret (all the cards are the same). When the magician did not

9781138482210_COGNITIVE_PSYCHOLOGY_PART_4.indd 584

28/02/20 4:14 PM



585

Problem solving and expertise

suggest a solution, 83% of participants solved the trick. However, when he claimed he could influence participants’ choice by a specific hand move, only 13% of participants solved the trick! Thus, mental set is so strong that a single exposure to an implausible solution can inhibit finding the actual (and more obvious) solution. We might assume experts given a problem in their area of expertise would be relatively unaffected by mental set. Bilalić et al. (2008a) tested this assumption with chess experts. Most failed to identify the shortest way to win a chess game, using instead a longer solution based on a familiar strategy. However, the most skilful players were least likely to be impaired by mental set. Why are chess experts susceptible to the damaging effects of mental set? Bilalić et al. (2008b) studied chess experts who had found the familiar solution but were seeking a better one. They still fixated features of the chessboard position related to the familiar solution. Thus, their attention was still partly controlled by processes producing the initial familiar solution even though they were unaware this was the case. In sum, mental set often impairs problem solving. Gobet (2016) argued that its negative effects are very widespread. For example, mental set can lead scientists to ignore findings inconsistent with their favourite theory (see Chapter 14). It is also relevant to myside bias (see Glossary) which involves people disregarding arguments disproving their beliefs (see Chapter 14).

KEY TERM Functional fixedness The inflexible focus on the usual function(s) of an object in problem solving.

Past experience: functional fixedness We turn now to a specific form of mental set: functional fixedness. Functional fixedness occurs when we mistakenly assume any given object has only a limited number of familiar uses. Duncker (1945) carried out a classic study on functional fixedness. Participants were given a candle, a book of matches, tacks in a box and several other objects (see Figure 12.9). Their task was to attach the candle to a wall by the table, so that it did not drip onto the table below. Most participants tried to nail the candle directly to the wall or glue it to the wall by melting it. Only a few produced the correct answer – use the inside of the tack box as a candle holder and then nail it to the wall with tacks. According to Duncker (1945), his participants “fixated” on the tack box’s function as a container rather than a platform. More correct solutions were produced when the box containing the tacks was empty at the start of the experiment because it appeared less like a container. More direct evidence that past experience can produce functional fixedness was reported by Ye et al. (2009). Participants decided whether objects could be used for a Figure 12.9 specific function (e.g., packable with – usable Some of the materials provided for participants instructed as packing material to pack an egg in a box). to mount a candle on a vertical wall in the study by Duncker Immediately afterwards, they decided whether (1945).

9781138482210_COGNITIVE_PSYCHOLOGY_PART_4.indd 585

28/02/20 4:14 PM

586

Thinking and reasoning

the same objects could be used for a different function (e.g., play catch with, over a distance of 15 feet). Some objects (e.g., ski cap; pillow) could be used for both functions. Deciding an object possessed the first function reduced the probability of detecting it also possessed the second function: this is functional fixedness. It is often assumed we are inflexible in our perceived uses of objects. Wagman et al. (2016) disputed this assumption. Rods with several added plastic pieces were regarded as more suitable for striking than poking an object. However, such rods were not regarded as suitable for striking with precision although they were suitable for striking with power. How can we overcome functional fixedness? Challoner (2009) studied 1,001 important inventions and solutions to insight problems. Two steps were typically involved: (1) Focus on an infrequently noticed or new feature. (2) Form a solution based on that obscure feature. McCaffrey (2012) argued crucial obscure features are ignored because people focus on the typical functions of objects based on their shape, size, material they were made of, and so on. This functional fixedness can be reduced by the generic-parts technique: (1) generate function-free descriptions of all object parts; (2) decide whether each description implies a use. McCaffrey gave some participants training in the generic-parts technique. These participants solved 83% of insight problems (e.g., Duncker’s candle problem) compared to only 49% in the control group.

Cognitive control: its role in insight, functional fixedness and mental set Cognitive control refers to “the ability to limit attention to goal-­relevant information and inhibit, or suppress, irrelevant distraction” (Amer et al., 2016b, p. 905). It is greater in individuals high in working memory capacity (which is related to attentional control; see Glossary). We might expect a high level of cognitive control to be advantageous on tasks involving insight, functional fixedness or mental set. However, that is not always the case. Cognitive control is associated with a narrow focus of attention on goal-relevant information and specific task strategies coupled with an inhibition of processing of other information sources (Amer et al., 2016b). Thus, high cognitive control can impair performance when a broad focus of attention would be beneficial.

Findings Pope et al. (2015) compared the ability to break a mental set in human adults, children and baboons. The original task was as follows: (1) presentation of two red squares followed by participants touching the locations previously occupied by those red squares: (2) if this was done correctly, a blue triangle was presented and had to be touched for reward. After participants had established a mental set, the task changed slightly – the blue triangle was present throughout. All participants needed to do was touch the

9781138482210_COGNITIVE_PSYCHOLOGY_PART_4.indd 586

28/02/20 4:14 PM



587

Problem solving and expertise

blue triangle for reward (thus breaking the mental set) although they could keep using the original strategy. Pope et al. (2015) found 100% of baboons successfully broke the mental set, as did 45% of children but only 12% of adults! Thus, the ability to break the mental set was inversely related to intelligence (and cognitive control). Baboons probably broke the mental set because that involved much less processing capacity than the original strategy. Human adults did not break the set because they found it hard to believe the task could be as easy as simply touching the blue triangle. DeCaro et al. (2016) compared the performance of individuals who were high and low in working memory capacity (resembling attentional control) on Knoblich et al.’s (1999) matchstick arithmetic problems. Some required insight (e.g., the Type B problem shown in Figure 12.6) whereas others did not (e.g., the Type A problem shown in Figure 12.6). Participants high in working memory capacity performed better than those low in working memory capacity scorers on problems not requiring insight (see Figure 12.10). However, the opposite was the case with insight problems. How can we explain the above findings? Individuals high in working memory capacity tend to consider complex problem solutions even when simple ones are required (DeCaro et al., 2016, 2017). This disadvantages high-capacity individuals on many insight problems. However, high-­ capacity individuals are often better than low-capacity ones at forming an initial problem representation and this facilitates their performance of many non-insight problems. Jarosz et al. (2012) considered the effects of alcohol intoxication on an insight task (Remote Associates Test; see Glossary). Intoxicated participants solved 58% of the problems compared to only 42% for sober participants. Alcohol intoxication broadened participants’ attentional focus beyond strong (but incorrect) associates of the three words presented on each trial.

100%

Problem success (probability)

90% 80% 70% 60% Incremental Insight

50% 40% 30% 30% 10% 0% Low (–1 SD)

High (+1 SD)

Working memory capacity

Figure 12.10 Mean percentages of correct solutions as a function of problem type (incremental, not requiring insight, vs insight) and working memory capacity (low vs high). From DeCaro et al. (2016).

9781138482210_COGNITIVE_PSYCHOLOGY_PART_4.indd 587

28/02/20 4:14 PM

588

KEY TERM Problem space An abstract description of all the possible states that can occur within a given problem.

Thinking and reasoning

Chrysikou et al. (2013) assessed the role of cognitive control in functional fixedness when participants generated common or uncommon uses for objects. When transcranial magnetic stimulation (TMS; see Glossary) was applied to the left prefrontal cortex to reduce cognitive control, this facilitated performance when uncommon uses for objects had to be produced.

Conclusions High cognitive control can impair performance on certain tasks, especially those “that are aided by the use of previously irrelevant information, or on tasks that generally benefit from drawing on diverse bits of information from various sources” (Amer et al., 2016b, p. 906). However, high cognitive control is advantageous on tasks requiring working memory and/or selective attention, and when distracting stimuli must be ignored. It remains for the future to establish precisely which tasks benefit from (or are impaired by) high cognitive control and to obtain a detailed understanding of the underlying mechanisms.

PROBLEM-SOLVING STRATEGIES Major landmarks in problem-solving research were an article by Newell et al. (1958) followed in 1972 by Newell and Simon’s book, Human Problem Solving. Their central insight was that the strategies we use when tackling complex problems reflect our limited ability to process and store information. More specifically, we have very limited short-term memory capacity and so complex information processing is typically serial (one process at a time). These assumptions were included in their General Problem Solver (a computer program designed to solve well-defined problems). Gobet and Lane (2015) evaluated this theoretical approach. On the positive side, the General Problem Solver was one of the first p ­ roblem-solving programs and led to Newell and Simon being identified as “the founding fathers of artificial intelligence” (Gobet & Lane, 2015, p. 141). In addition, Newell and Simon (1972) identified several important problem-­ solving strategies (discussed below). Limitations include exaggerating the role of serial processing in problem solving and the reliance on rather abstract and artificial problems. Newell and Simon (1972) used various well-defined, knowledge-lean problems (e.g., the Tower of Hanoi; see Figure 12.11). The initial problem state consists of up to five discs piled in decreasing size on the first of three pegs. When they are on placed in the same order on the last peg, the problem has been solved. Only one disc can be moved at a time and a larger disc cannot be placed on top of a smaller one. Newell and Simon (1972) identified a problem space for each problem. A problem space consists of the initial problem state, the goal state, all possible mental operators (e.g., moves) that can be applied to any state to Figure 12.11 change it into a different state, and all the interThe initial state of the five-disc version of the Tower of Hanoi problem. mediate problem states.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_4.indd 588

28/02/20 4:14 PM



How do we solve well-defined problems with our limited processing capacity? According to Newell and Simon (1972), we rely heavily on heuristics (see Glossary) – rules of thumb that are easy to use and often produce reasonably accurate answers. Heuristics can be contrasted with algorithms (computational methods guaranteed to produce a problem solution). Algorithms are often too complex to be used by most people. In this section, we consider some heuristics identified by Newell and Simon (1972). We also discuss other heuristics and strategies for problem solving.

Hill climbing Newell and Simon (1972) identified the hill-climbing heuristic. Hill climbing is a very simple strategy which involves changing the present problem state into one closer to the goal. It is mostly used when the problem solver has no clear understanding of the problem structure and so focuses on very shortterm goals. Use of hill climbing resembles a climber who tried to reach the highest mountain peak in the area by using the strategy of always moving upwards. This may work. However, the climber will probably find himself/ herself trapped on a hill several valleys away from the highest peak.

Means–ends analysis According to Newell and Simon (1972), the most important heuristic method is means–ends analysis. It resembles hill climbing but the problem solver has greater awareness of how to break the problem down into sub-problems. Here is the essence of means–ends analysis: ●● ●●

●●

589

Problem solving and expertise

KEY TERMS Algorithm A computational procedure providing a specified set of steps to problem solution; see heuristic. Hill climbing A simple heuristic used by problem solvers in which they focus on making moves that will apparently put them closer to the goal. Means–ends analysis A heuristic method for solving problems based on creating a subgoal to reduce the difference between the current state and the goal state. Meta-reasoning Monitoring processes that influence the time, effort and strategies used during reasoning and problem solving.

Note the difference between the current problem state and the goal state. Form a subgoal to reduce the difference between the current and goal states. Select a mental operator (e.g., move or moves) that permits attainment of the subgoal.

Means–ends analysis typically assists problem solution. However, Sweller and Levine (1982) found it can severely impair performance. Participants tried to solve an apparently simple maze, most of which was not visible. Some participants could see the goal state (goal-information group) whereas others could not. Use of means–ends analysis requires knowledge of goal location and so only the goal-information group could use that heuristic. However, the problem was designed so that means–ends analysis would not be useful – every correct move involved turning away from the goal. Only 10% of participants in this group solved the problem in 298 moves whereas those in the other group solved the problem in an average 38 moves.

Meta-reasoning Ackerman and Thompson (2017) emphasised the importance of meta-­ reasoning (processes that monitor our progress during problem solving

9781138482210_COGNITIVE_PSYCHOLOGY_PART_4.indd 589

28/02/20 4:14 PM

590

Thinking and reasoning

and reasoning and influence the strategies we adopt). One example is ­progress monitoring. Problem solvers assess their rate of progress towards the goal. If progress is too slow to solve the problem within the maximum number of moves allowed, they change their strategy. MacGregor et al. (2001) gave participants the nine-dot problem (see Figure 12.5) with one line of the solution to help them. Performance was worse when participants had the illusion of making progress (and so were slow to switch strategies). Payne and Duggan (2011) also studied progress monitoring. Participants received an unsolvable water-jar problem with a small or large number of possible problem states. When the problem had a small number of problem states, participants more rapidly abandoned the problem because it was easier to perceive progress towards a solution was impossible. Ackerman and Thompson (2017) discussed other aspects of meta-­ cognition related to progress monitoring. These include judgements of solvability, and feelings of rightness or error when problem solvers produce an answer to any given problem, all of which influence the decision as to whether to remain engaged in problem solving.

Planning It is generally assumed individuals presented with a complex problem engage in preliminary planning and that this planning involves prefrontal areas associated with planning. Supportive evidence comes from patients with damage to prefrontal areas, who typically have impaired planning and problem solving (Szczepanski & Knight, 2014). Goel and Grafman (1995) found patients with prefrontal damage performed worse than healthy controls on the Tower of Hanoi task (see Figure 12.11). The patients were especially disadvantaged on a difficult move involving moving away from the goal because they found it harder to plan ahead. Colvin et al. (2001) reported similar findings using water-jar problems. Patients with prefrontal damage and healthy controls used the hill-climbing strategy. However, the patients performed worse because their deficient planning made it harder for them to make moves conflicting with that strategy. As discussed earlier, prefrontal damage can produce planning problems in everyday life. For example, Goel et al. (2013) found patients with right prefrontal damage performed poorly on a real-world travel planning task because their planning was too Figure 12.12 piecemeal and insufficiently comprehensive. Tower of London task (two-move and five-move problems). Dagher et al. (1999) used the Tower of The balls in the bottom half must be rearranged to match the arrangement in the top half. London task in which coloured discs must From Dagher et al. (1999). By permission of Oxford University Press. be moved one by one from an initial state to

9781138482210_COGNITIVE_PSYCHOLOGY_PART_4.indd 590

28/02/20 4:14 PM



Problem solving and expertise

591

match the goal state (see Figure 12.12). There was increased activation of the dorsolateral prefrontal cortex when participants solved complex versions of this task. In sum, the prefrontal cortex is important in planning on many problem-solving tasks. However, many other brain areas are also involved (Szczepanski & Knight, 2014).

Sequential processing stages Tasks such as the Tower of Hanoi and Tower of London require planning a sequence of moves. However, we can distinguish between plan production and plan execution. With complex tasks, only some moves are typically planned, so executing the initial plan is followed by generating a further plan and then its execution. Crescentini et al. (2012) supported the distinction between plan production and plan execution using simple versions of the Tower of Hanoi task. The dorsolateral prefrontal cortex was more active during initial planning than plan execution. In contrast, posterior temporal areas, inferior frontal regions and dorsolateral premotor cortex were more activated during plan execution. Nitschke et al. (2012) obtained support for the assumption that Tower of London problems require participants to engage in problem representation followed by planning. On problems placing high demands on forming a problem representation, participants alternated their gaze more often between the start and goal state. On problems imposing high demands on planning, in contrast, the last fixation of the start state was unusually prolonged.

How much planning? Newell and Simon (1972) assumed problem solvers typically engage in limited planning because of the constraints of short-term memory capacity. Patsenko and Altmann (2010) obtained strong support using Tower of Hanoi problems. Sometimes they added, deleted or moved discs during participants’ eye movements so they were not directly aware of the change. These changes only minimally disrupted performance, strongly suggesting that the participants’ next move was triggered by the current state of the problem rather than a preformed plan. There are substantial individual differences in planning for problem-solving tasks. Koppenol-Gonzalez et al. (2010) found with the Tower of London task that some participants engaged in efficient planning (considerable preplanning of moves and high performance). In contrast, other participants showed very little evidence of effective planning (short period of preplanning and numerous errors). Most individual differences in performance on this task can be explained by the single factor of planning ability (Debelak et al., 2016). The amount of planning is very flexible. Delaney et al. (2004) found little evidence of planning on water-jar problems when participants chose their preferred strategy. However, instructions to generate the complete solution before making any moves led to detailed planning and

9781138482210_COGNITIVE_PSYCHOLOGY_PART_4.indd 591

28/02/20 4:14 PM

592

KEY TERMS Cognitive miser Someone who is economical with their time and effort when performing a thinking task. Cognitive Reflection Test A test assessing individuals’ tendencies to override intuitive (but incorrect) answers to problems.

Thinking and reasoning

faster problem solution. Morgan and Patrick (2013) argued that increasing the cost of accessing important task-relevant information (the goal state) on the Tower of Hanoi task would lead to more planning. It produced increased planning and also led to problems being solved in fewer moves. If planning involves deliberate processes, we would expect problem solvers to be consciously aware of it. Evidence suggesting important ­problem-solving processes occur below the level of conscious awareness was reported by Paynter et al. (2010) using event-related potentials (ERPs; see Glossary). They observed clear differences in the ERPs associated with correct and incorrect moves early in the problem when no behavioural evidence indicated participants were making progress.

Cognitive miserliness Many theorists have proposed dual-process theories to account for performance on cognitive tasks such as judgement, decision-making and reasoning. Evans and Stanovich (2013; see Chapter 14) reviewed these theories and identified various commonalities among them. Of particular importance, the theories distinguish two processes: (1) Type 1 intuitive processes are fast and relatively effortless; and (2) Type 2 reflective processes are slow and controlled. Most dual-process theorists argue that many individuals are cognitive misers. A cognitive miser is someone typically economical with their time and effort on tasks requiring thinking. Cognitive misers would often respond rapidly (but sometimes incorrectly) to problems using Type 1 processes without checking their answer using Type 2 processes. The Cognitive Reflection Test (Frederick, 2005), which assesses the extent to which people are cognitive misers, involves a conflict between Type 1 and Type 2 processes. Why don’t you take this very short test and then see how many of your answers are correct?

IN THE REAL WORLD: COGNITIVE REFLECTION TEST (1) A bat and a ball cost $1.10 in total. The bat costs $1.00 more than the ball. How much does the ball cost? ___ cents. (2) If it takes 5 machines 5 minutes to make 5 widgets, how long would it take 100 machines to make 100 widgets? ___ minutes. (3) In a lake, there is a patch of lily pads. Every day, the patch doubles in size. If it takes 48 days for the patch to cover the entire lake, how long would it take for the patch to cover half the lake? ___ days.

The correct answers are 5 cents (problem 1), 5 minutes (problem 2) and 47 days (problem 3). Do not worry if you did not get them all right – only  about 25% of highly intelligent individuals answer all the items correctly. Most incorrect answers (10 cents; 100 minutes; and 24 days) are intuitive responses produced rapidly by individuals using Type 1 ­processes.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_4.indd 592

28/02/20 4:14 PM



593

Problem solving and expertise

However, individuals producing incorrect answers often experience a feeling of error (Gangemi et al., 2015), suggesting some awareness of conflict. Low scorers on the Cognitive Reflection Test also perform relatively poorly on many other judgement and reasoning tasks (Toplak et al., 2014). This occurs in part because performance on the Cognitive Reflection Test correlates positively with intelligence. However, scores on the Cognitive Reflection Test predicted performance on several other tasks after the effects of intelligence were removed statistically. This finding suggests cognitive miserliness is found on many tasks. Travers et al. (2016) found participants ultimately giving the incorrect intuitive answer were not drawn to the correct answer at any point. This suggests they did not use Type 2 reflective processes. In contrast, participants ultimately producing the correct answer were nevertheless initially drawn to the incorrect intuitive answer suggesting they inhibited the incorrect intuitive answer. There is overlap between the notion of cognitive miser and Newell and Simon’s (1972) focus on problem solvers’ use of heuristics (discussed earlier, p. 589). In both cases, individuals resort to simple (and often inaccurate) strategies. However, Newell and Simon assumed our limited processing capacity forces us to use heuristics. In contrast, cognitive misers use heuristics because they are reluctant to engage in effortful processing rather than because they cannot.

KEY TERMS Analogy A comparison between two objects (or between a current and previous problem) that emphasises similarities between them. Fluid intelligence Non-verbal reasoning ability applied to novel problems.

ANALOGICAL PROBLEM SOLVING AND REASONING Here we discuss analogical problem solving. An analogy is “a comparison between two objects, or systems of objects, that highlights respects in which they are thought to be similar” (Stanford Encylopedia of Philosophy, 2013). Analogies are very important – we often cope successfully with novel situations by relating them to situations encountered previously. Analogical problem-solving performance correlates highly with IQ, leading Lovett and Forbus (2017, p. 60) to argue “Analogy is perhaps the cornerstone of human intelligence”. More specifically, there are close links between analogical problem solving and fluid intelligence, which “refers to the ability to reason through and solve novel problems” (Shipstead et al., 2016, p. 771). The most used test of fluid intelligence is Raven’s Progressive Matrices (Raven et al., 1998). It involves geometrical analogies and requires analogical reasoning (see Figure 12.13). Analogies have proved valuable in science. For example, the physicist Ernest Rutherford argued electrons revolve around the nucleus as the planets revolve around the sun. This analogy (like nearly all others) has limitations – planets in the solar system attract each other through gravitational force whereas electrons repel each other. Scientists working on the Mars Rover Mission used analogies when there was high uncertainty about scientific issues. Why did they use analogies? According to Chan et al. (2012, p. 1362), “Analogy supports problem solving under uncertainty by narrowing the space of possibilities to facilitate quick, approximate problem solving, reasoning, and decision making.”

9781138482210_COGNITIVE_PSYCHOLOGY_PART_4.indd 593

28/02/20 4:14 PM

594

Thinking and reasoning

Figure 12.13 A problem resembling those used on the Raven’s Progressive Matrices. The image from the bottom 8 images that best completes the top 3 x 3 matrix must be selected. From Lovett and Forbus (2017).

Analogical problem solving If we are to use a previous problem to solve the present one, we must detect similarities between them. Chen (2002) identified three main types of similarity between problems: (1)  Superficial similarity: solution-irrelevant details (e.g., specific objects) are common to the two problems. (2)  Structural similarity: causal relations between the main components are shared by both problems. (3)  Procedural similarity: procedures (actions) for turning the solution principle into concrete operations are common to both problems. Chen (2002) gave participants a problem providing them with an analogy having structural and procedural similarity with the target problem or one having only structural similarity with it. Performance was significantly better in the former condition because those participants were more likely to find the correct procedures or actions to solve the problem. With most analogical problems, participants must first retrieve appropriate past experience or knowledge and then adapt it to make explicit its relevance to the current problem. Gick and Holyoak (1980) found retrieval failures often underlie people’s inability to solve analogical problems. They used a problem where a patient with a malignant stomach tumour can only be saved by a special kind of ray (Duncker, 1945). However, a ray strong enough to destroy the tumour will also destroy the healthy tissue, whereas a ray that does not harm healthy tissue will be too weak to destroy the tumour. Only 10% of participants solved this problem when presented on its own. If you find the above problem puzzling, here is an analogy. A general wants to capture a fortress. However, the roads to it are mined, making it too dangerous for the entire army to march along any one of them.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_4.indd 594

28/02/20 4:14 PM



Problem solving and expertise

595

However, the mines were set so that small numbers of men could pass over safely. The general had his army converge on the fortress at the same time by walking along several different roads. Among participants who had previously memorised the story about the general and the fortress, 80% of them solved the radiation problem when informed of its relevance. However, only 40% of them solved it when not so informed. Why did so many of Gick and Holyoak’s (1980) participants fail to make spontaneous use of the relevant, memorised story? Of relevance, there were no superficial similarities between the story and problem. In contrast, when the story was superficially similar to the problem (it involved a surgeon using rays on a cancer), 88% of participants spontaneously recalled it when given the radiation problem (Keane, 1987). Kubricht et al. (2017) studied performance on Duncker’s (1945) radiation problem as a function of individual differences in fluid intelligence (see Glossary) assessed by Raven’s Progressive Matrices (discussed earlier, p. 593). Individuals high in fluid intelligence performed much better than low scorers when the radiation problem was preceded by a verbal analogy (approximately 85% vs 40%, respectively). Thus, high intelligence is a factor in facilitating effective use of analogies (discussed further later, pp. 597–598). Gick and Holyoak (1980) used the reception paradigm – participants received detailed information about a possible analogy before receiving a problem. However, individuals in everyday life generally produce their own analogies: the production paradigm. Blanchette and Dunbar (2000) confirmed that people given the reception paradigm often selected analogies based on superficial similarities. However, those given the production paradigm mostly produced analogies sharing structural features with the current problem. Experts (e.g., scientists) often use analogies because they provide a major source of new concepts and ways of thinking about problems. What kinds of analogies do experts use? Dunbar and Blanchette (2001) studied laboratory discussions of leading molecular biologists and immunologists. When they used analogies to fix experimental problems, the previous problem was often superficially similar to the current one. When they generated hypotheses, however, their analogies involved fewer superficial similarities and considerably more structural ones. Thus, the types of analogies used by scientists depend on their current goal. Do experts mostly use distant analogies (i.e., those linking different domains) or less distant within-domain ones? Dunbar (1995) found that 98% of analogies were within-domain when experts discussed issues with fellow experts. In contrast, more distant analogies are used when scientific experts communicate with less expert colleagues (Kretz & Krawczyk, 2014). This difference occurs because within-domain analogies are generally more detailed and precise than distant analogies.

Enhancing analogical problem solving How can we increase people’s use of analogies in analogical problem solving? There are two main approaches: (1) increasing the encoding of the underlying structure of the current problem; (2) increasing the use

9781138482210_COGNITIVE_PSYCHOLOGY_PART_4.indd 595

28/02/20 4:14 PM

596

Thinking and reasoning

of effective retrieval strategies. Minervino et al. (2017) adopted the first approach. All participants were initially presented with the fortress story. Some identified the similarities and differences between Duncker’s problem and another problem with a similar structure before attempting Duncker’s problem (experimental group). Others attempted Duncker’s radiation problem on its own (control group). Experimental group participants were much more likely than control group participants to solve the radiation problem (34% vs 9%, respectively). Their performance was superior because they understood more clearly the abstract or schematic structure of the radiation problem. Trench et al. (2016) found simply instructing individuals to use analogies when generating arguments to persuade a poor family to reduce its indebtedness increased their use of analogies fourfold. Other participants were instructed to use analogies drawn from areas not directly related to economic issues (e.g., health; human relations). This led to a substantial increase in arguments based on analogies (especially structural analogies). In sum, most individuals rarely produced analogies spontaneously but could easily do so when prompted.

Processes in analogical reasoning So far we have focused on analogical problem solving. However, much research on analogies has involved analogical reasoning. For example, consider four-term analogy problems taking the form A:B::C:D (A is to B as C is to D; e.g., GLOVE:HAND::SOCK:FOOT). Participants decide whether the two-word pairs express the same relationship (i.e., is it true that glove is to hand as sock is to foot?). Alternatively, only the first three terms (i.e., A, B and C) are provided with participants supplying the fourth term (i.e., D) themselves. Why are four-term analogy problems used so often in research? They differ from analogical problem solving in that they are tightly controlled – all the necessary information is presented explicitly and there is a single correct answer. These features facilitate the task of understanding the underlying processes.

Sequential processing stages Analogical reasoning involves several sequential processing stages. For example, Grossnickle et al. (2016) identified four component processes: (1) Encoding: information concerning the problem stimuli is processed. (2) Inferring: identifying a relation (i.e., similarity) between two items. (3) Mapping: identifying the overall relational pattern or rule governing the problem. (4) Applying: using the outcome of the mapping process to select the response completing the analogy. Grossnickle et al. (2016) compared the performance of high and low performers on tasks involving relational reasoning (including analogical

9781138482210_COGNITIVE_PSYCHOLOGY_PART_4.indd 596

28/02/20 4:14 PM

Problem solving and expertise

Conditional probabilities for each reasoning process



597 Figure 12.14 Probability of successful encoding P(E), successful inferring given successful encoding P(I/E), successful mapping given successful inferring P(M/I), and successful applying given successful mapping for low and higher performers.

1 0.95 0.9 0.85 0.8 0.75 0.7

From Grossnickle et al. (2016). Reprinted with permission of Elsevier.

0.65 0.6 0.55 0.5

P(E)

P(I|E)

All participants

P(M|I) Low performers

P(A|M) High performers

reasoning). The probabilities of successfully completing each process given the previous process had been successfully completed are shown in Figure 12.14. The inference and mapping processes were the hardest (especially for low performers). Vendetti et al. (2017) identified two major strategies used by people solving four-term analogy problems. According to project-first models, individuals first generate a rule relating the A and B terms, then they map the A and C terms, and finally they apply a rule generating D. According to alignment-mapping models, individuals first align the A and C terms, and then align the B item with the target (D item). Vendetti et al. (2017) used eye tracking to identify which strategy was used. The strategy identified by project-first models was used on 50% of trials. In contrast, the strategy identified by alignment-mapping models was used on 34% of trials. On average, reasoning performance was higher when the former strategy was used.

Working memory Analogical reasoning is sufficiently complex for us to predict it requires the central executive component of the working memory system (see Glossary and Chapter 6). If so, problem-solving performance should be impaired if a secondary task involving the central executive is performed at the same time. That has been found with four-term analogies (Morrison et al., 2001) and with Raven’s Matrices problems (Rao & Baddeley, 2013).

Individual differences We can increase our understanding of analogical reasoning by studying individual differences in reasoning ability. Much research has considered the relationship between analogical reasoning and working memory capacity (the ability to process and store information at the same time; see

9781138482210_COGNITIVE_PSYCHOLOGY_PART_4.indd 597

28/02/20 4:14 PM

598 Figure 12.15 Major processes involved in performance of numerous cognitive tasks. From Shipstead et al. (2016).

Thinking and reasoning

Level 1 Executive attention/ goal state

Level 2 Active processing/ focal attention

Level 3 Physical environment

Top-down signal organises maintenance and disengagement around a goal.

Top-down executive signal

Maintenance

Disengagement

To-be-performed task

The emphasis of maintenance and disengagement in carrying out top-down goals is partially determined by the nature of the to-beperformed task. Task provides an environmental medium around which cognitive processes are organised. Some tasks place a heavier burden on maintenance, others on disengagement.

Glossary). Ackerman et al. (2005) found in a meta-analysis the average correlation between measures of working memory capacity and performance on Raven’s Matrices (which requires analogical reasoning and involves fluid intelligence) was +.49. Shipstead et al. (2016) identified key processes associated with working memory capacity and fluid intelligence (see Figure 12.15). Most cognitive tasks (Level 3) require top-down, goal-focused executive attention (Level  1). Such tasks differ in the extent to which they also require maintenance (keeping relevant information accessible) and disengagement (removing or inhibiting outdated information) (Level 2). In essence, fluid intelligence involves executive attention + disengagement whereas working memory capacity involves executive attention + maintenance. Evidence that fluid intelligence and working memory capacity both involve executive attention was reported by Clark et al. (2017). Frontalparietal brain areas associated with executive attention were activated when participants performed tasks involving working memory or fluid intelligence. Harrison et al. (2015) obtained findings consistent with the above theoretical approach using Raven’s Matrices problems. Some problems involved a repeated-rule combination (i.e., the same rule combination as a previous problem) whereas others involved a novel-rule combination (i.e., a rule combination not previously used). Harrison et al. (2015) argued that individuals high in working memory capacity are better than those with low working memory capacity at maintaining information in memory. Accordingly, they should perform especially well on Raven’s Matrices problems with a repeated rule relative to those with a novel rule. That is exactly what they found. In contrast, individuals high in fluid intelligence (assessed by tests other than Raven’s Matrices) performed much better than those low in fluid intelligence regardless of whether problems involved a repeated or a novel rule.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_4.indd 598

28/02/20 4:14 PM



Problem solving and expertise

599

According to Shipstead et al.’s (2016) theoretical model, the excellent analogical reasoning of individuals high in fluid intelligence depends in part on their ability to disengage. An important aspect of disengagement is the ability to think flexibly. With Raven’s Matrices problems, that often involves a re-representation of problem structure by making it more abstract (Lovett & Forbus, 2017). For example, consider a problem with an upward-facing arrow followed by a rightward-facing arrow and then a downward-facing arrow. Participants who re-represent these figures as an arrow rotating clockwise perform better than those who do not. Lovett and Forbus (2017, p. 83) attach great importance to re-representation: Re-representation is critical because analogies are slaves to their symbolic representation. If two cases happen to be represented with different relational structure, they will fail to align, and the only way to complete the analogy will be to change the structure. In sum, successful performance on Raven’s Matrices problems requires a high level of goal-focused executive attention. In addition, it requires disengagement to inhibit task-irrelevant information. Key aspects of the disengagement process are flexibility and re-representation.

Brain mechanisms Krawczyk (2012) reviewed research on brain-damaged patients and neuroimaging studies to identify brain areas involved in analogical reasoning (see Figure 12.16): (1) Occipital and parietal areas are associated with visual and spatial processing, followed by extensive involvement of the prefrontal cortex. (2) Left rostrolateral prefrontal cortex (centred on BA10) integrates information within analogical problems. (3) The dorsolateral prefrontal cortex and inferior frontal gyrus are involved in inhibitory processes to prevent distraction and interference. (4) The temporal lobes are involved because information about concept meanings (semantic memory) is stored there. Suppose you are given the following problem: sandwich:lunchbox::hammer:____

Figure 12.16 Summary of key brain regions and their associated functions in relational reasoning based on patient and neuroimaging studies. RLPFC, rostolateral prefrontal cortex; DLPFC, dorsolateral prefrontal cortex; LIFG, left inferior frontal gyrus; Ctx, cortex. From Krawczyk (2012). Reprinted with permission from Elsevier.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_4.indd 599

28/02/20 4:14 PM

600

KEY TERM Expertise The high level of knowledge and performance in a given domain that an expert has achieved through years of systematic practice.

Thinking and reasoning

Possible answers are toolbox (correct), nail (a semantic distractor), gavel (auctioneer’s hammer, a perceptual distractor) and ribbon (an irrelevant distractor). Krawczyk et al. (2008) argued inhibitory processes involving the prefrontal cortex (brain area 3 above) are required on such problems to avoid incorrect answers involving relevant semantic or perceptual distractors. As predicted, patients with damage to the prefrontal cortex were more likely than those with damage to the temporal area to select semantic or perceptual distractors. The left rostrolateral prefrontal cortex is of central importance in analogical reasoning. Hobeika et al. (2016) conducted a meta-analysis of neuroimaging studies (mostly using visuo-spatial analogies based on the Raven’s Progressive Matrices or verbal four-word analogy problems – ­discussed earlier, pp. 596–597). The left rostrolateral prefrontal cortex was consistently activated with both visuo-spatial and verbal analogies, probably because of its involvement in mapping or relational integration. Urbanski et al. (2016) studied analogical reasoning performance in patients with frontal-lobe damage. Damage to the left rostrolateral prefrontal cortex (including BA10 and 47) was more consistently associated with impaired analogical reasoning than other frontal damage. These findings fit well with those of Hobeika et al. (2016). Other findings provide more direct support for the notion that the left rostrolateral prefrontal cortex is involved in mapping or relational integration. Green (2016) discussed his own research using verbal four-term analogies (discussed earlier) varying in the difficulty of mapping or relational integration. Activity within left rostrolateral prefrontal cortex increased progressively as the demands on mapping increased. In sum, research on brain mechanisms has identified the brain areas associated with the major processes involved in analogical reasoning. The consistent finding that the left rostrolateral prefrontal cortex is strongly involved when individuals solve several different analogical reasoning tasks implies (but not does prove) that these tasks involve similar cognitive processes.

EXPERTISE So far we have mostly discussed studies where the time available for learning was short, the tasks were relatively limited, and prior specific know­ ledge  was not required. In the real world, however, people often spend many years acquiring knowledge and skills in a given area (e.g., psychology; law; medicine; journalism). The end point of such long-term learning is expertise. Expertise is “elite, peak, or exceptionally high levels of performance on a particular task or within a given domain . . . An expert’s field of expertise can be almost anything from craftsmanship, through sports and music, to science or mathematics” (Bourne et al., 2015, p. 211). The development of expertise resembles problem solving in that experts  are extremely efficient at solving numerous problems in their area or domain of expertise. However, most traditional research on problem solving involved “knowledge-lean” problems, requiring no special ­knowledge or training. In contrast, studies on expertise typically use ­ “knowledgerich” problems requiring much knowledge beyond that contained in the

9781138482210_COGNITIVE_PSYCHOLOGY_PART_4.indd 600

28/02/20 4:14 PM



601

Problem solving and expertise

problem;  this knowledge has typically been acquired through prolonged practice and study. In what follows, we first consider chess expertise. There are various advantages to studying chess. First, the ELO ranking system (named after the chess master Arpad Elo) assesses chess players’ level of expertise. Second, expert chess players develop cognitive skills (e.g., pattern recognition; selective search) of general usefulness. Sala et al. (2017) discussed a meta-analytic review showing chess instruction improves achievement in mathematics and overall cognitive ability. Third, there are clear similarities between the remarkable memory for chess positions shown by chess experts and the vast knowledge experts in other domains have stored in long-term memory. We then discuss medical expertise (especially medical diagnosis), followed by a comparison of these two forms of expertise. After that, we consider the role of brain plasticity in expertise. Finally, we evaluate the hypothesis that deliberate practice is the main requirement for the development of expertise and also consider alternative theoretical approaches.

KEY TERM Template As applied to chess, an abstract schematic structure consisting of a mixture of fixed and variable information about chess pieces and positions.

CHESS-PLAYING EXPERTISE As already indicated, there are various reasons why it is valuable to study chess-playing expertise. For example, we can measure chess players’ levels of skill precisely based on their results against other players. In addition, the existence of permanent records of chess players’ tournament records over their entire career means detailed longitudinal data are available for analysis. The most obvious reason why some individuals are much better than others at playing chess is that they have devoted far more time to p­ ractice – it takes about 10,000 hours on average to become a grandmaster. Of special importance, expert chess players have much more detailed information about chess positions stored in long-term memory than non-experts. In classic research, De Groot (1965) presented chess players with brief presentations of board positions from actual games. After removing the board, they reconstructed the positions. Chess masters recalled the positions much more accurately than less expert players (91% vs 43%, respectively). This does not reflect differences in memory ability – there were no group differences when remembering random board positions.

Case study: Eye movements of expert chess players

Template theory What is the nature of the vast amount of chess-related information experts have stored in long-term memory? Gobet (e.g., Gobet & Waters, 2003) provided an influential answer in his template theory. A template is an abstract, schematic structure more general than an actual board position. Each template consists of a core (fixed information) plus slots (containing variable information about pieces and locations). Each template typically stores information relating to about ten pieces although it can be larger. Templates’ possession of slots makes them adaptable and flexible in use. Templates are built up out of small memory structures known as chunks (see Glossary).

9781138482210_COGNITIVE_PSYCHOLOGY_PART_4.indd 601

28/02/20 4:14 PM

602

Thinking and reasoning

Here are the main predictions of template theory: (1) Chess positions are stored in three templates, some of which are large. (2) Outstanding chess players owe their excellence more to their superior template-based knowledge of chess than slow, strategy-based processes. This knowledge can be accessed rapidly and permits expert players to narrow down the possible moves they consider. (3) Expert chess players store away the precise board locations of pieces after studying a board position. Chess pieces close together are most likely to be found in the same template (Gobet & Simon, 2000). (4) Expert players have superior recall than non-experts of random chess positions. The reason is they are better at recognising small chunks occurring by chance even in random positions. However, the memory superiority of experts should be greater with structured positions because experts can use their greater template knowledge as well as their greater chunk knowledge.

Findings

Fernand Gobet. Courtesy Fernand Gobet.

Gobet and Clarkson (2004) reported support for the first prediction. Expert players recalled chessboard positions much better than novices. However, the number of templates (averaging out at about 2) did not vary as a function of playing strength. The maximum template size was 13–15 for masters compared to only about 6 for novices. Evidence relating to the second prediction is less consistent. Charness et al. (2001) reported supportive evidence. Expert players were significantly more likely than intermediate players to fixate tactically relevant pieces very rapidly (within about 1 second). Sheridan and Reingold (2017a) presented four chess positions at the same time and asked chess players to find the one allowing the knight to reach a target square in three moves. Expert players’ eye movements indicated they were much faster than novices to fixate the target board. Further support was reported by Burns (2004), who focused on blitz chess (the entire game must be completed in 5 minutes). He assumed performance in blitz chess must depend mainly on players’ t­emplate-based knowledge because there is insufficient time to engage in slow searching through possible moves. As predicted, performance in blitz chess correlated highly (+.78 to  +.90) with performance in standard chess. Evidence less supportive of the second prediction was reported by van Harreveld et al. (2007) and Chang and Lane (2016). Skill differences between players were less predictive of game outcome as the time available decreased suggesting slow processes are more important for strong than weak players. Chang and Lane studied speed chess (longer than blitz chess but much shorter than standard chess). Players (especially stronger ones) often spent a considerable amount of time on a few moves indicating they were using strategy-­based processes.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_4.indd 602

28/02/20 4:14 PM



Problem solving and expertise

Moxley et al. (2012) asked experts and t­ou­ rnament players to think aloud while selecting the best possible moves with several problems. Their final move was generally much stronger than the first move considered for both groups (see Figure 12.17). Thus, slow, strategy-based processes play a key role in chess. We turn now to the third prediction. Chess players typically recall the precise squares occupied by given pieces within a template when asked to memorise board positions. However, actual chess playing focuses much more on evaluating board positions. McGregor and Howes (2002) asked expert players to evaluate various chess positions. These players subsequently had much better memory for attack/defence relations than precise board locations of the pieces. Linhares et al. (2012) found grandmasters outperformed masters, more with respect to memory for abstract features (e.g., strategically significant attacks or defences) than superficial features (e.g., specific board positions). The fourth prediction is that expert players will have better recall than non-­ experts of random chess positions. Supporting evidence was reported by Sala and Gobet (2017) in a meta-analysis. As predicted, the beneficial effects of expertise on recall of random chess positions were smaller than those obtained previously in structured chess positions.

Figure 12.17 Mean strength of the first-mentioned chess move and the move chosen as a function of problem difficulty by experts (top panel) and by tournament players (bottom panel). From Moxley et al. (2012). With permission from Elsevier.

Evaluation Template theory has several successes to its credit. First, much of the information experts store from board positions consists of a few large templates. Second, outstanding chess players possess much more knowledge about chess positions than experts, which gives them a substantial advantage when playing chess. Third, the tendency of experts to win at blitz chess is due mainly to their superior template-based knowledge (Burns, 2004). Fourth, experts have better recall of random board positions than non-­experts (Sala & Gobet, 2017). What are the limitations of template theory? First, slow search processes are more important to expert players than assumed by the theory (Moxley et al., 2012; van Harreveld et al., 2007). This is even the case with speed chess (Chang & Lane, 2016). Second, the most expert players often use strategies allowing them to go beyond stored knowledge of chess positions. Bilalić et al. (2008a)

603

604

Thinking and reasoning

presented chess players with a problem solvable in five moves using a familiar strategy but in only three moves using a less familiar solution. International Masters were far more likely than Candidate Masters to find the shorter solution (50% vs 0%) because they were better at avoiding the familiar, template-based solution. Third, the precise nature of the information stored in long-term memory remains controversial. Template theory assumes the precise locations of pieces are typically stored. However, it is likely attack/defence relations are more important (Linhares et al., 2012; McGregor & Howes, 2002). According to the theory, chess players have stored information in the form of chunks and templates. However, it is often hard to identify their respective roles. Fourth, template theory de-emphasises the importance of cognitive ability. Grabner et al. (2007) found all chess masters they studied had above-average intelligence. In addition, intelligence correlated significantly with the players’ rated skill level. Burgoyne et al. (2016) found in a meta-­ analytic review that chess skill correlated positively with several aspects of cognitive ability (e.g., processing speed; fluid reasoning, which involves understanding novel relationships).

MEDICAL EXPERTISE The processes involved in chess-playing expertise may (or may not) resemble those involved in other forms of expertise. Accordingly we will now consider medical expertise, specifically the ability of medical experts to make rapid and accurate diagnoses. This ability can literally be a matter of life-or-death. We will focus mostly on the search for abnormalities in medical images (e.g., X-rays; brain scans). Various methods have been used including eye-tracking and the think-aloud technique (Gegenfurtner et al., 2017). Eye-tracking can provide useful information about visual attention and subconscious processes, and think-aloud data can shed light on individuals’ eye fixations and decision-making. How do medical experts’ strategies differ from those of non-experts? One approach assumes there are three main reasons why abnormalities in medical images fail to be detected (see Figure 12.18): (1) There are detection errors when the crucial area within the image is not fixated. (2) There are recognition errors when the crucial area is fixated briefly but the doctor fails to appreciate its significance. (3) There are judgemental errors when the crucial area is fixated for some time (indicating it raised some concern) but its significance is not fully appreciated. The most obvious prediction is that experts will have fewer errors of all three types than non-experts. Figure 12.18 indicates three processes of relevance to the various error types. First, there is global or holistic perception of the image; if that is incomplete or ineffective, detection errors will occur. Second, there is focal or selective processing, involving deeper processing of only certain specific visual elements; failure of such processing leads to recognition errors.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_4.indd 604

28/02/20 4:14 PM



Problem solving and expertise

Global perception of the scene for identifying areas of value

605

Insufficient searching skills detection error Missing cues

Selectivity for deeper processing

Insufficient perceptual processing time recognition error

Pattern matching

Insufficient analysis skills judgemental error

Decision-making

Decisional error

Figure 12.18 A theoretical framework of the main cognitive processes and potential errors in medical decision-making. From Al-Moteri et al. (2017). Reprinted with permission of Elsevier.

Third, there is pattern matching, which involves finding a match between the visual medical image and patterns stored in long-term memory; failure of such processing leads to judgemental errors. There is an important distinction between explicit and implicit reasoning (Engel, 2008). Explicit reasoning is slow, deliberate and is associated with conscious awareness. In contrast, implicit reasoning is fast, “automatic” and not associated with conscious awareness. It involves the global processing identified by Al-Moteri et al. (2017; see Figure 12.18). Dual-process theories of judgement (see Chapter 13) and reasoning (see Chapter 14) are based on a similar distinction. The crucial assumption is that medical experts engage mainly in implicit reasoning whereas novices rely mostly on explicit (analytic) reasoning. This assumption makes sense given that experts have substantially more relevant visual and other knowledge stored in long-term memory. As a result, they can often rapidly engage in pattern matching (i.e., relating a given medical image to stored knowledge). We will consider evidence relevant to the above explicit–implicit distinction with respect to visual specialities (e.g., pathology; radiology). Note that experts generally cross-check their diagnoses with slow, deliberate processes even if they start with fast, “automatic” ones (McLaughlin et al., 2008).

Findings Sheridan and Reingold (2017b) reviewed research testing the hypothesis that experts engage in holistic or global processing. One prediction based

9781138482210_COGNITIVE_PSYCHOLOGY_PART_4.indd 605

28/02/20 4:14 PM

606

Figure 12.19 Eye fixations of a pathologist given the same biopsy whole-slide image starting in year 1 (a) and ending in year 4 (d). Larger circles indicate longer fixation times. From Krupinski et al. (2013). Reprinted with permission from Elsevier.

Thinking and reasoning

on this hypothesis is that experts can extract useful information from rapidly presented images. Kundel and Nodine (1975) found expert radiologists detected abnormalities very rapidly. Chest radiographs were presented for only 200 milliseconds but were correctly interpreted 70% of the time. Another prediction is that medical experts should be better able than non-experts to make use of information presented in peripheral vision. Several studies support this prediction. Krupinski et al. (2013) carried out a longitudinal study of pathologists viewing breast biopsies at the start of their first, second, third and fourth years of residency. Over time, there was a substantial reduction in fixations per slide and less examination of non-diagnostic regions (see Figure 12.19). Thus, training produced enhanced attentional focus. Kundel et al. (2007) showed doctors experienced in mammography difficult mammograms showing or not showing cancer. The mean time taken by experts to fixate a cancer was typically under 1 second. Time of first fixation on the cancer correlated –0.9 with performance (i.e., accurately detected breast cancer). Thus, fast fixation was an excellent predictor of performance. The rapid detection of abnormalities by experts suggests they engage in pattern matching (matching medical images to images stored in longterm memory). Of relevance, Jaarsma et al. (2014) considered how experts and non-experts explained their diagnoses after viewing medical images.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_4.indd 606

28/02/20 4:14 PM



Problem solving and expertise

607

Experts were much more likely to use terms such as “typical”, “regular” and “increase of” suggesting their diagnoses were based on comparisons between presented and stored images. Nodine and Mello-Thoms (2010) argued that experts use a detect-thensearch process, starting with rapid detection of diagnostically relevant information followed by a brief search to check there is no other relevant information. In contrast, novices use a search-then-detect process, i­nvolving extensive visual search and including much irrelevant information, followed by eventual detection of diagnostically relevant information. Brunyé et al. (2014) found novices were more likely than experts to fixate salient visual areas (e.g., brightly coloured ones) lacking diagnostic relevance. According to the theoretical framework shown in Figure 12.18, many detection failures occur even though the crucial area is fixated. Manning et al. (2006) studied nodule detection in chest radiology. Correct negative decisions (i.e., no nodule) were made faster than incorrect negative ­decisions. In the latter case, participants often fixated the nodule and were suspicious of it but failed to recognise it as a nodule. Rubin et al. (2015) studied the detection of very small lung nodules by experienced radiologists. The best performer detected 82% of fixated nodules whereas the worst performer detected only 47%. According to Al-Moteri et al. (2017), failures to detect abnormalities in medical images can occur because of judgemental errors. They discussed several studies in which medical experts sometimes fixated an abnormality for more than 1 second but failed to report it. Such findings are suggestive of judgemental errors. Are the effects of expertise on eye movements similar across different domains or areas? Gegenfurtner et al. (2011) reported a meta-analytic review involving domains including medicine, sport and transportation. Several differences between experts and non-experts were common across domains: (1) shorter fixations; (2) faster first fixations on task-relevant information; (3) more fixations on task-relevant information; (4) fewer fixations on task-irrelevant areas; and (5) longer saccades (rapid eye movements). We turn now to the roles of implicit and explicit (analytic) reasoning. Melo et al. (2012) argued medical experts use similar implicit or relatively “automatic” processes to those we all use when perceiving visual scenes. They found comparably fast times to diagnose abnormalities in chest X-ray images and to name animals (1.33 vs 1.23 seconds, respectively). Of most importance, diagnosing abnormalities and naming animals involved activation in very similar brain regions (see Figure 12.20). However, diagnosing abnormalities was associated with greater activation in the frontal sulcus and posterior cingulate cortex, suggesting diagnosis is more cognitively demanding than naming animals. Naming letters involved similar brain regions to diagnosing abnormalities and naming animals but with less activation. How can we explain the above findings? Melo et al. (2012) suggested medical experts engage in rapid pattern recognition: each slide is compared against stored patterns from the past. In other words, they use a predominantly visual strategy. Kulatunga-Moruzi et al. (2004) asked three groups varying in expertise to diagnose skin diseases from photographs. Some participants made their decisions from the photographs alone whereas others were also given a

9781138482210_COGNITIVE_PSYCHOLOGY_PART_4.indd 607

28/02/20 4:14 PM

608

Thinking and reasoning

Figure 12.20 Brain activation while diagnosing lesions in X-rays, naming animals and naming letters. The first column provides a right view, the middle column a left view and the last column a posterior view. From Melo et al. (2012).

comprehensive verbal description before each photograph. The least expert group performed best when given the verbal descriptions plus the photographs. In contrast, the more expert groups performed better without the verbal descriptions. They used a rapid visual strategy and the verbal descriptions interfered with their ability to use that strategy effectively. In spite of the above evidence, experts typically make some use of slow, explicit or analytic processes. Mamede et al. (2010a) compared the performance of medical experts and non-experts providing diagnoses immediately or after some analytic thinking. Analytic thinking enhanced the diagnostic performance of experts with complex cases but not simple ones. In contrast, non-experts derived no benefit from engaging in analytic thinking.

Evaluation The diagnostic strategies used by medical experts and non-experts often differ considerably. Experts use fast holistic processes more than non-­ experts. In addition, experts are more proficient than non-experts at using

9781138482210_COGNITIVE_PSYCHOLOGY_PART_4.indd 608

28/02/20 4:14 PM



609

Problem solving and expertise

slow, explicit or analytic processes. More generally, incorrect diagnoses can result from detection, recognition or judgemental errors. What are the limitations of theory and research? First, there are very few longitudinal studies (apart from Krupinski et al., 2013). Such studies are essential to understand the learning processes involved in the development of expertise. Second, we need more research on the ways fast and analytic processes are combined. For example, Kulatunga-Moruzi et al. (2011) found non-­ experts benefited from this combination when fast processes preceded analytic ones but not when analytic processes came first. Third, relatively little research has compared different training programmes designed to enhance diagnostic accuracy. Such research could clarify the advantages and disadvantages of different diagnostic strategies.

KEY TERM Plasticity Changes within the brain occurring as a result of brain damage or experience.

Chess expertise vs medical expertise Chess and medical expertise have several similarities. First, intensive training is required to attain genuine expertise. Second, this training leads to the acquisition of huge amounts of relevant stored knowledge. Third, experts in both areas are superior to non-experts at using rapid (apparently “automatic”) processes. Fourth, experts in both areas use analytic or ­strategy-based processes effectively when necessary. What are the differences between chess and medical expertise? First, while much of the knowledge possessed by chess experts consists of fairly abstract templates, medical experts are more likely to possess knowledge that is less abstract and more visual. Second, chess experts must relate a current chess position to their stored knowledge and then consider their potential next move and that of their opponent. In contrast, medical experts focus more narrowly on relating information about a specific case to their stored knowledge.

BRAIN PLASTICITY The development of expertise involves acquiring huge amounts of knowledge and specialised cognitive processes. Does the development of expertise also cause modifications within the brain? The key concept here is plasticity: “changes in structure and function of the brain that affect behaviour and are related to experience or training” (Herholz & Zatorre, 2012, p. 486). It is often assumed structural changes resulting from plasticity facilitate further learning and the development of expertise. Before discussing research on expertise, we will mention compelling evidence for plasticity in individuals becoming blind at an early age. They exhibit high levels of activity in occipital cortex (typically involved in visual processing) when performing many non-visual tasks (e.g., reading Braille; localising sound) (Heimler et al., 2014). This is known as cross-modal plasticity.

Taxi drivers Important research was carried out on London taxi or cab drivers, who have to acquire “The Knowledge”. This consists of detailed knowledge

9781138482210_COGNITIVE_PSYCHOLOGY_PART_4.indd 609

28/02/20 4:14 PM

610

Thinking and reasoning

of the 25,000 streets within 6 miles of Charing Cross and the locations of thousands of hospitals, tube stations and so on. Unsurprisingly, it takes three years to acquire all this information. How do cab drivers develop this extraordinary knowledge and expertise? The hippocampus (an area within the medial temporal lobes) is of major importance, as might be expected given its central role in long-term memory (see Chapter 6). Hippocampal damage is also associated with impaired spatial navigation skills (McCormick et al., 2018). Unsurprisingly, a taxi driver who had recently suffered extensive hippocampal damage had severely impaired navigation skills (Maguire et al., 2006). Acquisition of “The Knowledge” probably has a direct effect on the hippocampus. Experienced London cab drivers have a greater volume of grey matter in the posterior hippocampus than novice drivers (Woollett et al., 2009). However, cab drivers have a smaller volume of grey matter than other people in the anterior hippocampus. This is an area used in processing novel stimuli, imagining events and recalling events (Zeidman & Maguire, 2016). The finding that cab drivers perform poorly on tasks requiring them to learn and remember new object-place associations may reflect their reduced grey matter in the anterior hippocampus (Woollett & Maguire, 2009).

Causality The findings discussed so far are correlational and so cannot show acquiring “The Knowledge” causes hippocampal changes. Somewhat more direct evidence was reported by Woollett and Maguire (2011). Among adults who had spent several years acquiring “The Knowledge”, only those who succeeded in becoming London taxi drivers had a selective increase in grey matter in their posterior hippocampus. Hyde et al. (2009) studied 6-year-old children who received 15 months of instrumental musical training. They showed significant changes in voxel size (a voxel is a small cube of brain tissue) in the primary motor area (see Figure 12.21) and the primary auditory area (see Figure 12.22). In addition, children having the greatest brain changes showed the greatest improvement in musical skills. More evidence that training can alter brain structure was reported by de Manzano and Ullén (2018). They studied monozygotic (identical) twins where one twin had received at least 1,000 hours of piano practice more than the other, thus controlling for genetic factors. The brains of the more musically trained twins differed in various ways from the less trained ones (e.g., they had greater cortical thickness within the auditory-motor network). Herholz et al. (2016) carried out a longitudinal study in which adults received six weeks of training in playing the piano. There was evidence for plasticity: training enhanced activity in brain areas (e.g., premotor and posterior parietal regions) involved in motor preparation and sensori­ motor integration. Other brain areas (e.g., parts of the primary auditory cortex; premotor cortex) predicted individuals’ learning rates during piano training. These brain areas reflected individual differences in predisposition (potential for learning musical skills). In sum, these findings emphasise the value of considering individual differences in pre-training brain activity (predisposition).

9781138482210_COGNITIVE_PSYCHOLOGY_PART_4.indd 610

28/02/20 4:14 PM



Problem solving and expertise

611

Figure 12.21 The brain image shows areas in the primary motor cortex with differences in relative voxel size (a voxel is a small cube of brain tissue) between children receiving 15 months of instrumental music training and non-trained controls: (a) changes in relative voxel size over time in trained and non-trained groups (a value of 1.00 indicates no change); (b) correlation between amount of improvement in motor-test performance and change in relative voxel size for all participants. From Hyde et al. (2009). Reprinted with permission of The Society for Neuroscience. Permission conveyed through Copyright Clearance Center, Inc.

Evaluation Numerous studies have shown predictable differences in brain structure and function between individuals with varying amounts of training in a given domain (Zatorre, 2013). Support for the hypothesis that developing expertise can cause changes in brain structure has come from longitudinal studies where brain structure was assessed before, during and after training and from de Manzano and Ullén’s (2018) study on monozygotic twins. Progress has been made in identifying brain areas associated with predisposition (potential for learning) and training-related plasticity during learning (e.g., Herholz et al., 2016). What are the limitations of research on plasticity and expertise? First, it is hard to show definitively that practice has caused changes in brain structure of relevance to performance improvement. Second, most research has focused on musical training. This is reasonable given that musical training influences auditory perception and several aspects of higher-level cognition. However, it means the relevant database is relatively narrow.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_4.indd 611

28/02/20 4:14 PM

612

Thinking and reasoning

Right Heschl’s gyrus

Figure 12.22 Brain image showing areas in the primary auditory area with differences in relative voxel size between trained children and non-trained controls: (a) changes in relative voxel size over time; (b) correlation between improvement in melodyrhythm test and change in relative voxel size From Hyde et al. (2009). Republished with permission of The Society for Neuroscience. Permission conveyed through Copyright Clearance Center, Inc.

KEY TERMS Deliberate practice This is an effective form of practice provided that learns are given a task of moderate difficulty repeatedly, and have informative feedback so they can correct their errors.

Third, many complex effects of practice on neural plasticity have been reported in the literature. For example, Vaquero et al. (2016) found expert pianists had greater grey matter volume than controls in the putamen (involved in motor control and reinforcement learning). However, these experts had smaller grey matter volume than controls in other brain areas (e.g., superior temporal gyrus) involved in auditory processing and sensorimotor control. We lack a coherent theoretical understanding of such complexities.

DELIBERATE PRACTICE AND BEYOND We have seen deliberate practice over a period of many years is essential to become an expert in a given domain. That is obvious. Less obvious are the answers to two related issues. First, what determines the effectiveness of prolonged practice? Second, what factors other than prolonged practice are required to become an expert?

9781138482210_COGNITIVE_PSYCHOLOGY_PART_4.indd 612

28/02/20 4:14 PM



Problem solving and expertise

613

Ericsson (e.g., 2017) argued that the answer to the first question is ­deliberate practice. More specifically, he claimed that ten years (10,000 hours) of deliberate practice is required to achieve expertise. Controversially, he claimed such practice is the most important factor required to develop expertise and that innate talent or ability is of little or no relevance. Deliberate practice has four aspects: (1) (2) (3) (4)

The The The The

task is at an appropriate level of difficulty (not too easy or hard). learner is given informative feedback about their performance. learner has adequate chances to repeat the task. learner has the opportunity to correct their errors.

What happens as a result of prolonged deliberate practice? According to Ericsson and Kintsch (1995), experts can reduce the negative effects of having limited working memory capacity. They put forward the notion of long-term working memory. The crucial notion is that “fast . . . transfer to LTM [long-term memory] becomes possible with expertise via knowledge structures, which enables LTM [long-term memory] to be used during WM [working memory] tasks, thus giving the appearance of expanding individuals’ WM capacity” (Guida et al., 2013, p. 1). Suppose expert and novice chess players learn the positions of chess pieces on a board. Novices rely largely on working memory (a limited-­ capacity system that processes and stores information briefly; see Chapter 6). In contrast, experts use their huge relevant knowledge to store much of the information directly in long-term memory and thus enhance their recall of the board position. In other words, experts use long-term working memory, but novices do not.

IN REAL LIFE: MAGNUS CARLSEN, WORLD CHESS CHAMPION The Norwegian Magnus Carlsen was born on 30 November 1990. He became a chess grandmaster at the amazingly young age of 13 and the world chess champion in November 2013 (aged 22). In 2014, he was rated the strongest player in chess history. The difference between him and the second-best player (Levon Aronian) was almost as great as that between the 2nd and 14th best players. One of his greatest strengths is his “nettlesomeness” – meaning he is a vexatious individual who is superb at making moves that pressurise opponents into Magnus Carlsen, who became world chess champion in 2013. making mistakes. dpa Picture-Alliance/Alamy Stock Photo. Magnus Carlsen disproves the main assumptions of Ericsson’s deliberate practice theory. First, he became a grandmaster after only five years of deliberate practice although Ericsson claimed ten years of deliberate practice are required to achieve outstanding levels of performance.

9781138482210_COGNITIVE_PSYCHOLOGY_PART_4.indd 613

28/02/20 4:14 PM

614

Thinking and reasoning

Second, according to Ericsson’s theory, we would expect Magnus Carlsen to have accumulated more years of deliberate practice than other top chess players. However, when he became world champion, he had devoted 6½ years fewer to deliberate practice than the average of the next 10 best players in the world (Gobet & Ereku, 2014). Across the top 11 players in the world, the association between rating and number of years of practice was modestly negative. According to Ericsson’s theory, it should have been strongly positive. In sum, Magnus Carlsen’s career shows that talent and deliberate practice are both essential for the development of outstanding expertise. Indeed, Carlsen’s extraordinary talent has led to him being called “the Mozart of chess”.

KEY TERM

Findings: positive

Long-term working memory Used by experts to store relevant information rapidly in long-term memory and to access it through retrieval cues in working memory (see Glossary).

Ericsson and Chase (1982) demonstrated the powerful effects of deliberate practice. A student, SF, received extensive practice (one hour a day for two years) on the digit-span task (random digits are recalled immediately in the correct order). He increased his digit span from 7 digits to 80 (10 times the average performance level). SF made use of his great knowledge of running times (e.g., “3594” was Bannister’s world-record time for the mile and so he stored these digits as a single unit or chunk). After that, he organised chunks into a hierarchical retrieval structure. Thus, SF made very effect­ ive use of long-term working memory. Another student (Dario Donatelli) increased his digit span from 8 to 104 digits following 800 hours of practice (Yoon et al., 2018). Guida et al. (2012) reviewed neuroimaging studies comparing novices and experts performing working memory tasks. There were two main findings. First, experts and novices both showed activation in prefrontal and parietal areas associated with working memory processes. Second, only experts showed activation in medial temporal regions strongly associated with long-term memory. This finding suggests experts make more use than novices of long-term memory processes during task performance. To what extent can individuals’ level of expertise be accounted for by individual differences in deliberate practice? Campitelli and Gobet (2011) found across many studies on chess-playing expertise that the correlation between total practice hours and chess skill exceeded +.50. However, such correlational evidence cannot prove that the former directly caused the latter.

Research activity: Skill acquisition

Findings: negative We turn now to findings indicating that expertise does not depend mainly on deliberate practice. Macnamara et al. (2014) found in a meta-analytic review that the average correlation between deliberate practice and performance was +.35, which suggests deliberate practice explains 12% of the variance in performance. However, the percentage of the variance explained differed considerably across domains: it was 26% for games, 21% for music, 18% for sports, but only 4% for education and