Stimulated Recall Methodology in Applied Linguistics and L2 Research [2nd ed.] 9781315813349

Stimulated Recall Methodology in Applied Linguistics and L2 Research provides researchers and students in second languag

1,380 72 3MB

English Pages 161 Year 2016

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Stimulated Recall Methodology in Applied Linguistics and L2 Research [2nd ed.]
 9781315813349

Table of contents :
Cover......Page 1
Half Title......Page 2
Title Page......Page 4
Copyright Page......Page 5
Dedication......Page 6
Table of Contents......Page 8
List of Illustrations......Page 9
Preface......Page 10
1 Introduction to Introspective Methods......Page 12
2 Introspection and L2 Research......Page 33
3 Characterization of Stimulated Recall......Page 53
4 Data Analysis......Page 88
5 Using Stimulated Recall as an Additional Data Source......Page 112
6 Limitations......Page 134
References......Page 145
Author Index......Page 156
Subject Index......Page 158

Citation preview

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

STIMULATED RECALL METHODOLOGY IN APPLIED LINGUISTICS AND L2 RESEARCH

Stimulated Recall Methodology in Applied Linguistics and L2 Research provides researchers and students in second language acquisition and applied linguistics with the only how-to guide on using stimulated recalls in their research practice. This new edition expands on the scope of the previous edition, walking readers step by step through a range of studies in applied linguistics in order to demonstrate the history of stimulated recalls and their efficacy as a data-collection tool. With its exclusive focus on stimulated recalls, coverage of the most up-to-date research studies, and pedagogically rich text design, Stimulated Recall Methodology in Applied Linguistics and L2 Research supplies researchers and students with the practical skills to elicit reliable and valid data in their own research. Susan M. Gass is University Distinguished Professor in the Department of Linguistics and Germanic, Slavic, Asian and African Languages at Michigan State University, USA. Alison Mackey is Professor in the Department of Linguistics at Georgetown University, USA.

SECOND LANGUAGE ACQUISITION RESEARCH SERIES

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

Susan M. Gass and Alison Mackey, Series Editors

Recent Monographs on Theoretical Issues: Bardovi-Harlig/Hartford Interlanguage Pragmatics: Exploring Institutional Talk (2005) Dörnyei The Psychology of the Language Learner: Individual Differences in Second Language Acquisition (2005) Long Problems in SLA (2007) VanPatten/Williams Theories in Second Language Acquisition (2007) Ortega/Byrnes The Longitudinal Study of Advanced L2 Capacities (2008) Liceras/Zobl/Goodluck The Role of Formal Features in Second Language Acquisition (2008) Philp/Adams/Iwashita Peer Interaction and Second Language Learning (2013) VanPatten/Williams Theories in Second Language Acquisition, Second Edition (2014) Leow Explicit Learning in the L2 Classroom (2015) Dörnyei/Ryan The Psychology of the Language Learner – Revisited (2015) Kormos The Second Language Learning Processes of Students with Specific Learning Difficulties (2017) Recent Monographs on Research Methodology: Barkhuizen/Benson/Chik Narrative Inquiry in Language Teaching and Learning Research (2013) Jegerski/VanPatten Research Methods in Second Language Psycholinguistics (2013) Larson-Hall A Guide to Doing Statistics in Second Language Research Using SPSS and R, Second Edition (2015) Plonsky Advancing Quantitative Methods in Second Language Research (2015) De Costa Ethics in Applied Linguistics Research: Language Researcher Narratives (2015) Mackey and Marsden Advancing Methodology and Practice:The IRIS Repository of Instruments for Research into Second Languages (2015) Tomlinson SLA Research and Materials Development for Language Learning (2016) Gass/Mackey Stimulated Recall Methodology in Applied Linguistics and L2 Research, Second Edition (2017) Polio/Friedman Understanding, Evaluating, and Conducting Second Language Writing Research (2017)

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

STIMULATED RECALL METHODOLOGY IN APPLIED LINGUISTICS AND L2 RESEARCH Second Edition Susan M. Gass and Alison Mackey

ROUTLEDGE

Routledge Taylor & Francis Group

NEW YORK AND LONDON

Second edition published 2017 by Routledge 711 Third Avenue, New York, NY 10017

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

and by Routledge 2 Park Square, Milton Park, Abingdon, Oxon OX14 4RN Routledge is an imprint of the Taylor & Francis Group, an informa business © 2017 Taylor & Francis The right of Susan M. Gass and Alison Mackey to be identified as authors of this work has been asserted by them in accordance with sections 77 and 78 of the Copyright, Designs and Patents Act 1988. All rights reserved. No part of this book may be reprinted or reproduced or utilised in any form or by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying and recording, or in any information storage or retrieval system, without permission in writing from the publishers. Trademark notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. First edition published by Routledge in 2000. Library of Congress Cataloging in Publication Data Names: Gass, Susan M, author. | Mackey, Alison, author. Title: Stimulated recall methodology in applied linguistics and L2 research / Susan M. Gass and Alison Mackey. Description: Second Edition. | New York, NY : Routledge, [2016] | Series: Second Language Acquisition Research series | Includes bibliographical references and index. Identifiers: LCCN 2016008412 (print) | LCCN 2016012220 (ebook) | ISBN 9780415743884 (hardback : alk. paper) | ISBN 9780415743891 (pbk. : alk. paper) | ISBN 9781315813349 (Master) | ISBN 9781317801269 (Web PDF) | ISBN 9781317801252 (ePub) | ISBN 9781317801245 (Mobipocket/Kindle) Subjects: LCSH: Second language acquisition--Research. | Recollection (Psychology) Classification: LCC P118.2 .G376 2016 (print) | LCC P118.2 (ebook) | DDC 418.001/9--dc23 LC record available at http://lccn.loc.gov/2016008412 ISBN: 978-0-415-74388-4 (hbk) ISBN: 978-0-415-74389-1 (pbk) ISBN: 978-1-315-81334-9 (ebk) Typeset in Bembo by HWA Text and Data Management, London

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

For Deanna Mackey, to whom Alison owes everything, with love.

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

This page has been left blank intentionally

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

CONTENTS

List of Illustrations Preface 1 Introduction to Introspective Methods

viii ix 1

2 Introspection and L2 Research

22

3 Characterization of Stimulated Recall

42

4 Data Analysis

77

5 Using Stimulated Recall as an Additional Data Source

101

6 Limitations

123

References Author Index Subject Index

134 145 147

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

ILLUSTRATIONS

Figures 1.1 1.2 2.1 3.1 4.1a 4.1b 4.1c 4.1d 4.1e 4.1f 4.1g 4.2 4.3 4.4 4.5 4.6 4.7 6.1

The states of heeded information in a cognitive process Types of introspection Diagrammatic model of theoretical and methodological framework Types of prompts and thoughts accessed in stimulated recall Sample training protocol Sample coding sheet for stimulated recall comments Sample coding sheet for interaction episodes Sample coding sheet for stimulated recall comments Rater and task schedule Training schedule Schedule for interrater reliability Transcript 1 from Hawkins (1985) Transcript 2 from Hawkins (1985) Transcript 3 from Hawkins (1985) Transcript 4 from Hawkins (1985) Preliminary (flawed) version of coding sheet Final version of coding sheet Hierarchy of invalidities

9 13 24 45 83 85 86 86 87 88 88 90 91 92 93 94 95 128

Tables 1.1 1.2 2.1 3.1

Differences between three types of introspection Adaptation of classification categories of introspection research SLA studies using stimulated recall from 2000–2015 Three types of stimulated recall

17 18 29 46

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

PREFACE

The first edition of this book was published in 2000, at a time when most people in the field had not heard of the term stimulated recall methodology. In Chapter 2 we talk about how, in the sixteen years since then, the use of the method has exploded and the term has become so common that our first edition, which was initially referenced almost every time someone carried out a study using the methodology, is often not cited, showing that the methodology has now become mainstream. Given this widespread use, we recognized the need for a new edition to provide updates and reflect developments in the literature. At the same time, we have made some internal reorganizations, and our new edition reflects the current state of the art in the field, as well as representing the front and center position of stimulated recall as a common data elicitation tool. Throughout this new edition, we have borne in mind our primary goal of providing a text that serves as a guide for both novice and seasoned researchers alike, while also assuming that most readers have some background in the general topic of second language learning. We explain key concepts, how-tos, and provide literature-based examples. As in our other methodology books and textbooks, we take a broad and inclusive view of ‘second language’ research. So our examples reflect concepts from a variety of perspectives in the fields of applied linguistics and second language research, including studies reflecting a range of learning contexts and studies reflecting differences in learning backgrounds. We include examples of a wide range of stimulated recall uses in supplementing experimental data, qualitative research, as well as teacherinitiated research in classrooms. We are grateful to many individuals for their support in this project that ended up, like most projects of this sort, having a longer history than we had

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

x

Preface

originally anticipated. For both editions, we first thank the students we have had in different classes over the years who have not hesitated to provide feedback on our various syllabi and our sequencing of materials, as well as the designs of our own research. We also thank the many research assistants at both Georgetown University and Michigan State University who have helped us with collecting studies, making databases, editing and many other useful forms of input. We were fortunate in the final stages to have the help of Lara Bryfonski and Brandon Tullock at Georgetown University. Our in-house press editors were, as always, supportive and patient. We are grateful for the support and encouragement we consistently receive from everyone at Routledge. As a final note, Susan Gass acknowledges Samuel, Jonah, and Gabriel Ard as the biggest stimuli in her life. Susan M. Gass Williamston, Michigan Alison Mackey Georgetown, Washington, DC

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

1 INTRODUCTION TO INTROSPECTIVE METHODS

Second language acquisition (SLA) research has as its ultimate goal the understanding of what learners know of a second language (L2), how they come to know it, and how they put that knowledge to use. Over the years, the field of SLA has increased in its ways of eliciting data (see Gass & Mackey, 2007 and Mackey & Marsden, 2016, for overviews and examples of elicitation techniques). Traditionally, language production (both spoken and written) has been a main staple for researchers. However, data stemming from language production is limited in what it can tell us about how languages are learned and used. Researchers have turned to different techniques to enhance an understanding of SLA data. One important source of data in the field comes from what learners themselves say about what they know or about how they process their L2, also known as introspective verbal reports. Verbal reports can be collected from learners either concurrently with language production (think-alouds; Bowles, 2010) or after a language event. This book deals extensively with one specific introspective method, known as stimulated recall. This covers a subset of introspective methods representing a means of eliciting data about the thought processes that take place while a learner is doing a task or an activity. The assumption underlying introspection, in general, is that it is possible to tap into and document a learner’s internal processes in much the same way as one can observe external real-world events. This is predicated on an additional assumption, namely that humans have access at some level to their internal thought processes and can verbalize those processes. In this chapter we provide background information, including a history of introspection and its place within the fields of philosophy, psychology, and linguistics. We detail the ways that L2 researchers have used stimulated recall (Chapter 2) and, importantly, we provide information on the dos and don’ts of

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

2

Introduction to Introspective Methods

conducting, coding, and analyzing stimulated recalls (Chapters 3 and 4). In Chapter 5, we discuss how stimulated recall can be used to mitigate limitations present in other studies where triangulating data through stimulated recall procedures, like the ones described in this book, will provide researchers with important information, but not discernible from the original data alone. In order to provide readers of this book with a balanced view of the information that stimulated recall can provide to researchers, we present the limitations of stimulated recall (Chapter 6). Data collection in L2 research has evolved considerably since the publication of our first edition more than a decade ago. The use of new techniques (e.g., eye-tracking) has greatly enriched our understanding of L2 learning processes. These novel measures also bring up new reasons for including stimulated recall data in designs.

Background As mentioned earlier, historically, the main source of data for understanding how L2s are learned has come from production data, and, even more specifically, utterances produced by learners. In fact, in the early years of the systematic study of SLA, Selinker (1972) stated that researchers should focus … analytical attention upon the only observable data to which we can relate theoretical predictions: the utterances which are produced when the learner attempts to say sentences of a TL. (Selinker, 1972, pp. 213–14, emphasis in original) While this view is still maintained by some, it has never been entirely accepted. Corder (1973), for example, argued that forced elicitation data were necessary. In other words, spontaneously produced utterances provide only a part of the picture. If one wants to obtain information about the grammatical knowledge that learners have, one also must have a means to determine which sentences learners think are possible in an L2 (i.e., grammatical) and which are not possible in an L2 (i.e., ungrammatical). To accomplish this, data collection from a source other than language production is often necessary (see additional discussion in Gass, 1997; Gass & Polio, 2014). In addition to determining actual knowledge of the L2, we also need to understand how that knowledge comes about. Most processes involved in learning are not directly observable. All that is observable is what a learner produces, in writing or in speech. However, there are methodological tools that one can use to understand those processes. Various methods have been used in the field of L2 research to determine underlying linguistic knowledge, including asking learners to introspect about their knowledge. The focus of this book, stimulated recall, is one such method,

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

Introduction to Introspective Methods

3

which is generally classified under the broader term of introspection. Like many methodological tools, introspection has had a long history and has fallen into disfavor at times along its path. However, it is now being used once again with increased frequency and with increased confidence. In this chapter, we contextualize stimulated recall through an examination of the broader area of introspection, with a focus on verbal reporting. We begin by briefly considering some of the historical context of introspection. We then document the rich background work on stimulated recall as an introspective method.

Introspective Methods There is a long history of use of reflections on mental processes, originating in the fields of philosophy and psychology. Lyons (1986) traced this history in Western thought to Augustine and possibly even to Aristotle. Such mentalistic reflections are often classified as introspection. As mentioned previously, such methods have been in and out of favor in scholarly circles, in part due to skepticism surrounding the ability one has to accurately access one’s thought processes. Introspection assumes that a person can observe what takes place in consciousness in much the same way as one can observe events in the external world. Lyons cited the definition of introspection in the Concise Oxford English Dictionary as “the examination or observation of one’s mental processes” (1986, p. 1). This broad definition subsumes a number of different approaches, and, as a definition, has proven to be too general for many scholars, especially those at pains to distance themselves from introspection. Our present-day nuances of the term introspection are based on the epitome of introspection, what Lyons called the “golden age of introspection” (1986, p. 2), covering the timeframe from the sevnteenth century to the early part of the twentieth century. A seminal thinker in the area of introspection is Descartes. A basic premise of his work (e.g., Discourse on Method 1637 [1960]) was the notion of mind as a separate entity: a person’s mind is fundamentally separate from a person’s body; a person’s mind is also fundamentally separate from the minds of others. With this as background, we turn now to a discussion of the potential usefulness of reflection.

The Usefulness of Reflections on Mental Processes In this section we consider how the use of reflections on mental processes has been conceived. A fundamental question is whether certain types of reflection might be more useful than others. One component of usefulness of any investigative method is how likely that method is to produce “true” results. The issue of truth is a complex one, a detailed examination of which is beyond the scope of this book. Nonetheless, two tests can be attempted: falsifiability and replicability. If

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

4

Introduction to Introspective Methods

a statement cannot be falsified at all, then some would consider it outside the realm of science, that is, not within the true, at least in Canguilhem’s terminology (1989, 1994). Hence, we would want any self-description about mental processes to be falsifiable in practice. Such a result is not always easy to achieve. Consider Wittgenstein’s (1958) discussions of how certain we can be about another person’s expressions of pain. Wittgenstein likened meaning to a rope made up of many strands twisted together and argued that we can never really know the exact meaning that another person intends to convey. Philosophy apart, presumably, some self-descriptions of mental processes might be more falsifiable than others. Replicability presents similar problems. If one person reports some mental process, how could these processes be replicated in another person or even at another time in the same person?

Classification There are a number of different ways of classifying mental processes. UÊ Temporal: One relevant taxonomy is temporally based. Classical introspection is atemporal. In deciding upon his famous phrase, “Cogito ergo sum,” Descartes was not reflecting on his mental functioning at the moment of thinking, or of any particular time, but of his mental functioning in general. We might well question whether it is possible to think about mental processes in general without recalling some experience in particular. Thus, we may question whether atemporal introspection is really a form of retrospection. It is difficult enough to gain awareness of mental functioning; trying to recall past instances might prove particularly troublesome. Running commentary and stimulated recall, we argue, may be more reliable. UÊ Details of mental processes: Another distinction that can be made involves the particulars of the mental processes that we attempt to describe. Perhaps the most accepted descriptions are of judgments, such as acceptability judgments common in linguistic research. These have been aptly described by Habermas (1979) as a reconstructive approach. Native speakers of a language know that something is or is not acceptable in that language. UÊ Philosophical/psychological perspectives: Wilhelm Wundt (1896), one of the founders of experimental psychology, studied language in detail and practiced introspection, but he did not use introspective methods in studying the psychology of language. To the contrary, he felt that introspective methods would not prove successful in investigating language because, in his view, language was a social phenomenon. Clearly, Wundt’s view and practice of introspection as a method of self-observation, and the phenomena he described, differ from the verbalizations of thought processes, which are in popular use today and which form the main topic of this book.

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

Introduction to Introspective Methods

5

An interesting model for mental phenomena is Dennett’s (1987) analogy of mental processing and magic tricks. According to Dennett, the awareness we have of mental processing is much like the awareness we have of a magic trick. We observe a magic trick and can see what must be explained, but simply observing the trick does not typically lead to an understanding for most people about how the magician pulled off the trick. The trick provides the explanandum, rather than the explicans. As individuals attempting to report our mental processes, we can only report what we are conscious of. We have no access to what is really occurring at any other level.1 One difficulty with the procedure of introspection is that humans are essentially sense-making beings and tend to create explanations, whether such explanations can be justified or not. This is the central thesis of Dennett’s (1987) book, The Intentional Stance. He argued that we tend to understand activities as if they are the product of some meaning-producing entity. Experiments on splitbrain patients (i.e., individuals with a severed corpus callosum where the most efficient communication pathway between the two cerebral hemispheres is no longer functional) indicate that the left hemispheres of our brains are excellent at producing meaningful explanations, even if in error (Gazzaniga, 1998). For most individuals, only the left hemisphere is able to create a narration. In an experiment with split-brain individuals, patients were asked to point to the picture from a set that best related to another picture placed above the set. Both hands of all the patients were equally adept at the picture-matching test. However, only the left hemisphere (which controls the right hand) was able to construct a coherent narrative of why a particular picture was appropriate. If the individual were asked why the object pointed to by the left hand was chosen, the left hemisphere did not know and the right hemisphere could not say. Gazzaniga found that the left hemisphere constructed a plausible story relating what the left hand pointed to and what the left hemisphere saw. This experiment is interesting in that it demonstrates that human beings tend to create explanations for phenomena, even when these explanations may not be warranted. This finding is important when considering introspective methods because clearly there is a danger that individuals may create plausible stories for other descriptions of mental activity, without that story being an accurate representation of reality.

Introspection and Behaviorism Introspection, as a methodological tool, fell into disfavor with the rise of behaviorism. The main goal of behaviorism was to gain information about human behavior, not through looking inwards, but by observing, measuring, and interpreting human behavior. Watson (1913, as cited in Lyons, 1986) stated:

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

6

Introduction to Introspective Methods

Psychology as the behaviorist views it is a purely objective experimental branch of natural science. Its theoretical goal is the prediction and control of behavior. Introspection forms no essential part of its methods, nor is the scientific value of its data dependent upon the readiness with which they lend themselves to interpretation in terms of consciousness. (Lyons, 1986, p. 23) These comments and similar ones were a reaction to some of the early work in the field of psychology (such as that by Titchener, 1908 and Wundt, 1896, as discussed by Blumenthal, 1970) that relied heavily on techniques of introspection as a way of gaining insight into the human mind. As Lieberman (1979) pointed out, however, the techniques of introspection used by the early experimentalists were quite different from introspection techniques of today. In earlier years of introspection as a methodology, participants went through significant training before carrying out the introspection process. Lieberman noted that in Wundt’s laboratory participants had to “practice at least 10,000 separate introspections” (1979, p. 320) before being considered qualified to participate in an actual introspection. This is clearly problematic in that the mere fact of practice may alter thought processes, especially practice in such incredible amounts. Patterns of behavior and preconceptions would be established, making it unclear what participants were accessing. The notion of consciousness has always been a concept subject to debate in the field of psychology, and introspection was once seen as a way to access consciousness. Introspective analysis assumed that the functionings of the mind were in fact accessible to observation. However, Freud’s popular perspective clearly made it theoretically impractical to view the human mind in this way. During the years when theories of behaviorism were in ascendance in the field of psychology (from about the turn of the twentieth century), consciousness was not a favored or even a valid area of research. The tools used to investigate it were also summarily dismissed. The debate became even sharper in 1920 when Watson (cited by Lyons, 1986) stated: It is a serious misunderstanding of the behavioristic position to say … —“And of course a behaviorist does not deny that mental states exist. He merely prefers to ignore them.” He “ignores” them in the same sense that chemistry ignores alchemy, astronomy horoscopy, and psychology telepathy and psychic manifestations. The behaviorist does not concern himself with them because as the stream of his science broadens and deepens such older concepts are sucked under, never to reappear. (Lyons, 1986, p. 24) In other words, consciousness was not a serious enterprise and had as much validity as some of the “quack” sciences. Not only were there theoretical

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

Introduction to Introspective Methods

7

difficulties with the concept of introspection as anathema to behaviorism, but also much of the early work using introspection proceeded uncritically and uncontroversially (as noted by Ericsson & Simon, 1993). With the lack of scientific rigor, it was not surprising that the results emanating from introspection were inconsistent and not trusted. Hence, it was argued that the method itself was at fault and should be eliminated from the repertoire of methodologies used in psychology. This resulted in what is often termed ‘throwing out the baby with the bathwater.’2 B. F. Skinner (1953, 1957), the influential psychologist and behaviorist, emphasized the importance of observable behaviors in psychology. As a science, psychology was conducted (as were natural sciences) by controlling variables to determine their effects on, in the case of psychology, behaviors. According to Lyons (1986), behaviorist explanations of internal (mental) events reflected covert (inner), reduced (to behavior) and unemitted forms of external behavior. By 1953, however, Skinner acknowledged that inner events could be observed by the possessor of them.3 This, in a sense, went some way toward legitimizing introspection as a research tool. Lyons (1986) pointed out that the failure of behaviorists to account for the “problem of privacy” (p. 44) led to “current centralist (brain-centered) psychologies and philosophies” (p. 44). He claims that “What we gain access to … is a private and personal storehouse of myriad public performances, edited and ‘replayed’ according to largely stereotyped views about our cognitive life” (p. 148). In other words, the access to actual cognitive processes is not direct but is, as Lyons put it, replayed through memory.

Behaviorism and the Study of Language Linguistic data in the earlier part of the twentieth century were collected by means of observation; linguistic generalizations were made by gathering speech samples from individuals and then analyzing those data in terms of the patterns that they represented. Bloomfield (1933), in a rewrite of his early 1914 work, spoke of mentalistic and mentalist psychology. The mentalists would supplement the facts of language by a version in terms of mind—a version which will differ in the various schools of mentalistic psychology. The mechanists demand that the facts be presented without any assumption of such auxiliary factors. (Bloomfield, 1933, p. vii) With the advent of cognitive psychologies and their focus on internal events, such as processing (cf. Bruner, Goodnow, & Austin, 1956; Miller, Galanter, & Pribram, 1960; Newell & Simon, 1956) and with Chomsky’s (1957, 1959) attack on Skinnerian behaviorism, a new climate arose that allowed for introspection.

8

Introduction to Introspective Methods

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

And, with regard to teaching, the new research paradigms opened new doors. For example, working in the field of general education, Shulman noted that: To understand adequately the choices teachers make in classrooms, the grounds for their decisions and judgments about pupils, and the cognitive processes through which they select and sequence the actions they have learned to take while teaching, we must study their thought processes before, during, and after teaching. (Shulman, 1986, p. 23) Shulman referred to his earlier work with Elstein (Shulman & Elstein, 1975) in pointing out three main types of cognitive process research when dealing with teaching: judgment and policy, problem solving, and decision making. From such work in education there was an evolving requirement for some sort of verbal reporting, at least when exploring issues such as problem solving and decision making. Uncovering cognitive processes was clearly a complex issue, and the door was opened to a consideration of introspection. In the case of research in linguistics, the rejection of behaviorism concurrently promoted a type of introspection as the main linguistic methodology, namely, grammaticality judgments4,5 which have been used since the early days of transformational grammar. Linguists commonly use introspection as a source for their own theoretical work. As Bard, Robertson, and Sorace (1996, p. 32) stated “For many linguists, intuitions about the grammaticality of sentences comprise the primary source of evidence for and against their hypotheses.” Not only do linguists use themselves as introspective resources, but when gathering information about languages unknown to them, they ask others for their intuitions about acceptability and unacceptability in their native language.

Verbal Reporting Verbal reporting is a special type of introspection and assumes a model of information processing described by Ericsson and Simon: To obtain verbal reports, as new information (thoughts) enters attention, the subjects should verbalize the corresponding thought or thoughts … the new incoming information is maintained in attention until the corresponding verbalization of it is completed. (Ericsson and Simon, 1987, p. 32) Ericsson and Simon illustrated their perspective as shown in Figure 1.1. The top panel of Figure 1.1 represents a normal sequence of states of heeded information (i.e., silent thoughts). The middle panel, representing talk-aloud

Introduction to Introspective Methods

9

Silent

S(2)

S(3)

S(1)

S(2)

S(3)

Vocalization (1)

Vocalization (2)

Vocalization (3)

S(1)

S(2)

S(3)

Verbal encoding (1)

Verbal encoding (2)

Verbal encoding (3)

Vocalization 1

Vocalization (2)

Vocalization (3)

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

S(1)

Talk Aloud

Think Aloud

FIGURE 1.1 The states of heeded information in a cognitive process and their relation to verbalizations under three different conditions, from Ericsson and Simon (1987); reprinted with permission of Multilingual Matters

data, illustrates the vocalization of silent speech. In the bottom panel, thinkaloud, individuals have to convert silent speech into a form that can then be vocalized. Thus, going from the top panel to the bottom panel involves increasing levels of task complexity. Ericsson and Simon claimed that the information to be reported does not change depending on the medium of the task (i.e., whether the cognitive work is done silently or verbally). What is important for our present purpose is the fact that in a study comparing performance on three tasks, Ericsson and Simon (1984) found similarities in the responses. Not surprisingly, there was a difference in the time needed to complete each of these task types; think-alouds took longer to perform than the others. As noted by Ericsson and Simon (1996), the terms “talk-aloud” and “thinkaloud” are often used interchangeably. However, they also note that despite their own interchangeable use of these terms, it is sometimes necessary to distinguish between the two. In a talk-aloud protocol, thoughts have been “already encoded in verbal form”, whereas in a think-aloud protocol, a participant “recodes verbally and utters thoughts that may have been held in memory in some other form (e.g., visually)” (1996, p. 222). Throughout our discussion, we will not make this differentiation, as the distinction is not relevant for stimulated recalls.

10

Introduction to Introspective Methods

Verbal report data have also been subcategorized. For example, Cohen (1998) outlined three primary categories he suggests are used in L2 research:

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

1

2

3

Self-report: With self-report data, one can gain information about general approaches to something. For example, “I am a systematic learner when it comes to learning a second language.” This sort of statement might be found on a typical L2 learning questionnaire. Such statements are removed from the event in question and are of less concern here than other types of verbal reporting. Self-observation: Self-observation data can be introspective (within a short period of the event) or retrospective. A learner reports on what she or he did. An example provided by Cohen (1998, p. 34) is “What I just did was to skim through the incoming oral text as I listened, picking out key words and phrases.” Such self-observations refer to specific events and are not as generalized as self-report data. Self-revelation: This is what is often described as think-aloud. A participant provides an ongoing report of his or her thought processes while performing some task.

The term process tracing is also used to refer to methodologies of verbal reporting. What is meant by process tracing is that one can trace the mediating processes of participants during the performance of a specified task. Shavelson, Webb, and Burstein (1986) outlined three types of process tracing. Their categories are similar to those mentioned above by Cohen (1998). The first is think-aloud, sometimes also referred to as talk-aloud, as noted earlier. This is also known as verbal reporting (cf. van Someren, Barnard, & Sandberg, 1994). The second involves thinking about a previously performed task (i.e., retrospective protocols), and the third involves a prompted interview, for example, watching a video of an event, listening to an audio recording of an event, or even seeing a piece of writing just completed. This latter is known as stimulated recall. van Someren, et al. (1994) were careful to point out that the think-aloud method “is a means to validate or construct theories of cognitive processes, in particular of problem-solving” (p. 9). Despite different terminology, verbal reporting can be seen as gathering data by asking individuals to vocalize what is going through their minds as they are solving a problem or performing a task. Verbal reporting allows researchers to observe how individuals may be similar or different in their approach to problems. The think-aloud protocols illustrated in Example 1.1 (van Someren et al., 1994, pp. 5–6) reveal two very different thought processes during the solving of the same problem. As van Someren et al. (1994) pointed out, these two protocols, produced by two individuals who eventually arrived at the correct answer, reflect very different problem-solving approaches: one algebraic and one hit-or-miss combined with logic. It is only through a think-aloud procedure that these differences in

Introduction to Introspective Methods

11

process manifest themselves. Considering simply the outcome would only provide the information that these individuals arrived at the same (correct) answer.

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

EXAMPLE 1.1 Two different approaches to the same problem, as evidenced through a think-aloud protocol

Problem to be solved: A father, a mother, and their son are 80 years old together. The father is twice as old as the son. The mother has the same age as the father. How old is the son? Student 1

Student 2

1. a father, a mother and their son are together 80 years old

1.

father, mother and son are together 80 years old

2. the father is twice as old as the son

2.

how is that possible?

3. the mother is as old as the father

3.

if such a father is 30 and mother too

4. how old is the son?

4.

then the son is 20

5. Well, that sounds complicated

5.

no, that is not possible

6. let’s have a look

6.

if you are 30, you cannot have a son of 20

7. I just call them F, M and S

7.

so they should be older

8. F plus M plus S is 80

8.

about 35, more or less

9. F is 2 times S

9.

let’s have a look

10. and M equals F

10.

the father is twice as old as the son

11. what do we have now?

11.

so if he is 35 and the son 17

12. three equations and three unknowns

12.

no, that is not possible

13. so S. . .

13.

36 and 18

14. 2 times F plus S is 80

14.

then the mother is

15. so 4 times S plus S is 80

15.

36 plus 18 is 54

16. so 5 times S is 80

16.

26 …

17. S is 16

17.

well, it might be possible

18. yes, that is possible

18.

no, then she should have had a child when she was 9

19. so father and mother are 80 minus 16

19.

oh, no

20. 64

20.

no the father should, the mother should be older

21. er … 32

21.

for example 30

22.

but then I will not have 80 continued …

12

Introduction to Introspective Methods

Example 1.1 continued

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

Student 1

Student 2 23.

80 minus 30, 50

24.

then the father should be nearly 35 and the son nearly 18

25.

something like that

26.

let’s have a look, where am I?

27.

the father is twice …

28.

the mother is as old as the father

29.

oh dear

30.

my mother, well not my mother

31.

but my mother was 30 and my father nearly 35

32.

that is not possible

33.

if I make them both 33

34.

then I have together 66

35.

then there is for the son . . . 24

36.

no, that is impossible

37.

I don’t understand it anymore

38.

66, … , 80

39.

no, wait, the son is 14

40.

almost, the parents are too old

41.

32, 32, 64, 16, yes

42.

the son is 16 and the parents 32, together 80

Reprinted from van Someren et al., The Think Aloud Method, 1994, pp. 5–6, by permission of Academic Press. Copyright Elsevier.

Differentiating Introspective Reports In general, introspective reports can differ along a number of dimensions: time of reporting (concurrent or retrospective), form (oral or written), and the amount of support. These introspective types are schematized in Figure 1.2. In L2 research, two types of verbal reporting have dominated the field: thinkalouds and stimulated recall. Despite the fundamental similarity, namely, that both attempt to gain information about thought processes that take place during

Introduction to Introspective Methods

13

Introspection

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

Time frame

Concurrent

FIGURE 1.2

Retrospective

Form

Oral

Written

Support

Both

None

Full

Types of introspection

an event, these two elicitation types differ along all three dimensions. First, with regard to the time frame, think-alouds are concurrent whereas stimulated recalls occur after the fact. Second, think-alouds are generally oral; stimulated recalls are also for the most part oral, but the recall could be written. A third difference has to do with the support provided. In think-alouds, the support is the event itself; in stimulated recalls, the support (i.e., stimulus) is some artifact of the event (e.g., a product, a video, or an audio-recording). This book deals with stimulated recall; for an in-depth coverage of think-alouds, see Bowles (2010).

Advantages and Disadvantages As with any methodological tool, there are advantages and limitations to the use of verbal reporting. The major advantage of the use of verbal report is that one can often gain access to processes that are unavailable by other means (as was the case with the mathematical problem-solving transcripts discussed earlier). However, it is necessary to question the extent to which verbal report data are valid and reliable. For example, are the reports given consistent with the behavior of participants? Various researchers (e.g., Ericsson & Simon, 1980; Lieberman, 1979) have shown that verbal reports are reliable measures and that results obtained using verbal reports do correspond with actual behavior. In fact, Ericsson and Simon (1996, citing Anderson, 1987) affirm that “[c]oncurrent and retrospective verbal reports are now generally recognized as major sources of data on subjects’ cognitive processes in specific tasks” (p. xi). In their metaanalysis, Fox, Ericsson, and Best (2010) explore the various issues involved in interpreting concurrent reporting, making it clear that there are many factors that impact the validity of this type of data. An important consideration has to do with the accuracy of the reporting. This is particularly the case in self-report and self-observational data. A second consideration has to do with the type of memory structure used in recalls. With self-report and self-observational data, when the time between the event reported and the reporting itself is short, there is a greater likelihood that the reporting will

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

14

Introduction to Introspective Methods

be accurate. This is known as veridicality and is a particular issue with retrospective reports because participants may not accurately recall their thought processes.6 We return to this issue in Chapter 3 when we address procedural details. Yet another concern has to do with validity and reactivity of the procedure itself. Does the mere fact of verbalizing a process alter the cognitive process itself? This is more a concern with concurrent verbalizations than with stimulated recalls in that the verbalization cannot affect the cognitive process given its non-concurrent status. For a broader discussion of reactivity in introspective reports, the reader is referred to Bowles (2010). In the following section, we focus specifically on a specific subset of introspective reports, namely, stimulated recall.

Stimulated Recall Stimulated recall is an introspective method originally developed by Bloom (1953) in his investigation of the thought processes of students during lectures and discussion sessions. Additional refinements of the process come from Siegel, Siegel, Capretta, Jones, and Berkowitz (1963) who took the technique one step further and considered not live lectures, but videotaped lectures. Bloom’s original justification for the methodology was that the subject may be enabled to relive an original situation with vividness and accuracy if he is presented with a large number of the cues or stimuli which occurred during the original situation (Bloom, 1953, p. 161) As discussed earlier, stimulated recall methodology is a technique in which participants are asked to recall thoughts they had had while performing a prior task or while they had participated in a prior event. It is assumed that some tangible (perhaps visual or aural) reminder of the event will stimulate recall of the mental processes in operation during the event itself and will, in essence, aid the participant in mentally reengaging with the original event. In other words, the theoretical foundation for stimulated recall relies on an information-processing approach whereby the use of, and access to, memory structures is enhanced, if not guaranteed, by a prompt that aids in the recall of information. In sum, it is a technique that is intended to access cognitive processes during an event by asking participants to reflect on that event. This is theoretically justifiable given that: the sequence of thoughts occurring during performance of a task is stored in long-term memory. Immediately after the task is completed, there remain retrieval cues in short-term memory that allow effective retrieval of the sequence of thoughts. (Ericsson & Simon, 1996, p. xvi)

Introduction to Introspective Methods

15

Box 1.1 provides a typical sequence of events that occurs during a stimulated recall procedure.

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

BOX 1.1

A sequence of events for stimulated recall.

The goal of this study was to understand how learners perceive feedback 1.

Two individuals participate in a spot-the-difference task where they each have a different picture and without showing it to his/her partner, each person describes the picture to his/her partner to discover a pre-determined number of differences. The exchange is videotaped.

2.

At some point following the episode, a researcher and one of the participants meet to look at the video. At points where there is some form of feedback, the researcher stops the video and says: “What were you thinking then?”

Stimulated recall had its origins in the field of education, but its usage has grown well beyond its original focus. To name a few areas, it has been used in medical research, including psychiatry and nursing (Barrows, 2000; Daly, 2001; Elstein, Shulman, & Sprafka, 1978; Hansebo & Kihlgren, 2001; Liimatainen, Poskiparta, Karhila, & Sjogren, 2001; Liimatainen, Poskiparta, Sjogren, Kettunen, & Karhila, 2001; Salvatori, Baptiste, & Ward , 2000; Spenser & Parikh, 2000), in technology education (Fox-Turnbull, 2009), physical education (Allison, 1987; Tjeerdsma, 1997), conflict resolution (Kressel, Henderson, Reich, & Cohen, 2012), teacher librarians’ mental models (Henderson & Tallman, 2006); decision making by sports’ coaches (Gilbert, Trudel, & Haughian, 1999), and management (Burgoyne & Hodgson, 1983 – they use the word stimulus, but not stimulated recall). A crucial assumption behind stimulated recall (or any type of recall) is the basic one of recall accuracy. Bloom (1954) attempted to verify the reliability of recall by recording classroom events and asking participants to recall an overt event that occurred immediately following something in the recording. He found that if the recalls were prompted a short period of time after the event (generally within 48 hours), recall was 95 percent accurate. Accuracy declined as a function of the intervening time between the event and the recall. Bloom made the assumption that “the recall of one’s own private, conscious thoughts approximates the recall of the overt, observable events” (1954, p. 26). Thus, he argued, the recall method itself is valid for the procurement of information about one’s thoughts during an event. It has an advantage over a simple posthoc interview in that the latter relies heavily on memory without any prompts and it has an advantage over think-aloud protocols in that for think-alouds the

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

16

Introduction to Introspective Methods

researcher needs to train participants, and even after training, not all participants are capable of carrying out a task and simultaneously talking about the task. This holds true to a greater extent with speaking activities, for which it is extremely difficult if not impossible to carry out a speaking task and talk about it simultaneously, without the process of think-aloud affecting the task talk. Whereas Bloom’s concern was the classroom, the ideas behind stimulated recall have been extended to other areas. For example, Kagan, Krathwohl, and Miller (1963) in a technique that they call interpersonal process recall (part of stimulated recall methodology), investigated interpersonal behavior. In their interesting implementation of the technique, a counselor and client participate in a counseling interview in a closed circuit television studio. The camera is present, and no one but the counselor and client are in the room. At the completion of the interview, the client and counselor are moved to separate rooms, each one accompanied by an interviewer. The videotape of the original session is played back simultaneously in both rooms. The interviewers instruct the participants (counselor and client) to describe what they were feeling during the session, to interpret what they or the other had said, and to translate body movements. Any of the four (i.e., counselor, client, two interviewers) could stop the tape when they wanted in order to comment or to probe. Stimulated recall has also been used as a tool for teacher training and to evaluate teaching effectiveness. Peterson and Clark (1978) videotaped classrooms and extracted from the videotapes four short (2–3 minute) segments representing the beginning and ending of the class and two random sequences in between. Specific questions were posed to teachers after they watched each segment. Although it is probably best known for its uses in cognitively oriented research (see review by Lyle, 2003), some researchers use stimulated recall to uncover factors beyond those that are exclusively cognitively-oriented. For example, questions about individuals’ perspectives on learning (Erickson & Mohatt, 1977) can be explored, as can their impressions of social interactions. Stimulated recall can also be used to explore children’s development, including their use of argument skills (Benoit, 1995) or their reasoning abilities (Hample, 1984). One can also use stimulated recall to probe solitary composing processes in L1 or L2 writing (Rose, 1984) or interactions of their social affective and linguistic issues in talk about writing (DiPardo, 1994). It can be used to explore readers’ lexical retrieval mechanisms or their opinions and impressions about what they have read. Stimulated recall is often used to address questions in research on teachers and their actions, including their decision-making and interactive thoughts (Calderhead, 1981a, 1981b). Finally, it should be noted that stimulated recall is often employed in conjunction with other methodologies, as a means of triangulation or further exploration. There are important differences between stimulated recall and other types of introspection. Some are outlined in Table 1.1, based on the three types of process tracing, discussed by Shavelson et al. (1986).

This takes place after an event, with a prompt. Example: participant describes a picture and is given feedback on her errors. Following the original episode, she is shown a video of the exchange and is asked about her thoughts as the correction was taking place. UÊ Provides a guide to memory verbalization (through stimulus). UÊ If done close to the event, there is good recall. UÊ Participants’ thought processes are readily generated.

UÊ May not be reliable if the questions are not posed correctly. UÊ Timing is important; if the recall is not done close to the event itself, reliability suffers. UÊ Requires preparation for participants and interviewers. UÊ Requires more time than think-alouds.

This takes place after an event, but without a prompt. An individual is asked what she was thinking. Example: participant is asked what she did as she was listening to an oral report.

UÊ Is not as cumbersome as SR in that a specific stimulus does not need to be prepared. UÊ Does not need elaborate training as is the case for think-alouds or SRs. UÊ There is greater opportunity for a researcher to explore thoughts than in SRs given the scripted nature of SR questions. UÊ Without an aid to stimulate memory, there is less recall than with SR or selfobservations. UÊ Without pre-determined questions on the part of the researcher, there is the possibility that the researcher (unintentionally) draws attention to something that the participant does not actually recall. Participants may bring in thought processes that were not present initially.

UÊ Can mimic a natural situation in which someone needs to do something and talk about it at the same time. UÊ Takes less time than SR. UÊ Has greater validity than selfobservation given that the cognitive thoughts are taking place simultaneously with task.

UÊ Is done simultaneously with a task so that attention is not focused solely on the task or solely on the recall. UÊ Not natural. UÊ Actions (doing an activity) can be faster than thought processes. UÊ Requires preparation for participants.

Advantages

Disadvantages

Prompted interview (stimulated recall)

Self-observation

Simultaneously participating in an event and talking about it. Example: participant plays a game (e.g., chess) and talks about the thought processes as she makes moves.

Think-aloud

Differences between three types of introspection

What is it?

TABLE 1.1

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

18

Introduction to Introspective Methods

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

Classification Scheme for Introspective Research In this section we discuss some of the most prominent areas where introspective research in general has been used in second language studies.7 Færch and Kasper (1987) offered a useful classification scheme (see Table 1.2) for the collection (not analysis) of introspective data. As they acknowledged, their classificatory system is derived in part from work by Cohen (1987), Cohen and Hosenfeld (1981), Ericsson and Simon (1980, 1984), and Huber and Mandl (1982). Considering the categories in column one, the first categorization (i.e., object of introspection) relates to whether one is dealing with linguistic (cognitive), affective, or social factors. Within the category of language-based factors, is the goal to consider knowledge or use? Modality refers to whether the data to be TABLE 1.2 Adaptation of classification categories of introspection research (Færch & Kasper, 1987)

Category

Explanation

Object of introspection

Linguistic/Cognitive

What can stimulated recall do? Use

All

Knowledge Affective Social Modality

Oral versus written data

Both

Relation to concrete action

Is the introspection related to a concrete event or is it generic?

Related to concrete events

Temporal relation to action

What is the temporal distance between action and verbalization?

Generally immediate

Participant training

To what extent is participant training necessary?

Generally, no specific training necessary

Elicitation procedure

Amount of structure

Structure is present

Amount of recall support

Recall support always present

Who initiates verbalizations

Can be either participant or researcher

Interaction between participant and researcher

Generally yes

Integration with action

Always based on a prior event

Combination Is there support data from other of methods elicitation measures?

Generally not

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

Introduction to Introspective Methods

19

introspected are written or oral. The third category (i.e., relationship to concrete action) refers to whether or not a specific event is being talked about. This is the case in all stimulated recall studies but is not the case with introspection in general. For example, studies using judgment data are more likely to ask generic questions, such as, “Can you say X in your language?” Distance from the event refers to the time when introspection takes place vis-à-vis the original event. Diary studies may be at some distance from the event; in experimental work, such as that found with stimulated recall, the verbalization generally takes place close to the time of the event. Participant training refers to the specialized training that participants need in order to complete the verbalization task. For example, on-line (in real time) verbal reporting generally requires training, whereas diary writing generally requires little training. There are numerous ways of eliciting introspective data. For example, questionnaires may be quite structured, diary writing perhaps much less so. Stimulated recall always requires support (i.e., a record of the event is always present) whereas simple recall of an event does not. Another area to consider is the initiator of the recall. Is it the researcher or experimenter? The participant? All of them? What interaction is there between the researcher and the participant? Finally, is the introspective data supplemented by data of other sorts? For examples, see Allison (1987) and Tjeerdsma (1997) who supplemented their study with think-aloud data.

Conclusion Stimulated recall is an important research tool, but one that, like any other research tool, must be used with full knowledge of its strengths and limitations. Stimulated recall methodologies have been criticized on a number of points, most notably on the memory structures being accessed, and on issues of reliability and validity (see Ericsson & Simon, 1993; Smagorinsky, 1994 and for empirical testing/treatment (see Egi, 2007, 2008; Godfroid & Spino, 2015; Leow & Morgan-Short, 2004; Sanz, Lin, Lado, Wood Bowden, & Stafford, 2009; Smith, 2012). Although many of these criticisms have been discussed in the psychological and educational research literature, systematic explorations are seldom found within the second or foreign language literature. With the current increase in the use of stimulated recall methodology, L2 researchers need to be aware of the pitfalls and problems noted in the psychology/educational literature. As many have noted, there are important questions of validity with this methodology. Thus, studies that utilize stimulated recall methodology often require carefully structured research designs to avoid problems. These issues are discussed in detail in later chapters. In Chapter 2 we focus on stimulated recall in L2 research emphasizing the broad range of uses with regard to scope and theoretical perspectives.

20

Introduction to Introspective Methods

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

Notes 1 People can be trained to be more observant about their reflective processes. We discuss this in more detail in Chapter 3. 2 There is an interesting parallel from the SLA literature. In the early years of systematic SLA studies, much work was conducted within the framework of behaviorism (see description in Gass with Behney & Plonsky, 2013). In particular, early work on transfer traditionally focused on behavioral aspects of transfer. With the demise of behaviorism, it was important within an SLA context to show that SLA was not a behavioristic activity. This entailed throwing off the shackles of language transfer. That is, because transfer was strongly associated with behaviorist thought, to show that L2 learning was not a behaviorist activity, researchers argued (Bailey, Madden, & Krashen, 1974; Dulay & Burt, 1974a, 1974b, 1975) that transfer was not a major or even an important factor for L2 learning. The link between behaviorism and its tools of analysis on one hand and its theoretical extensions on the other often went unchallenged with the demise of the theory. As a consequence, one saw a de-emphasis of all transfer research. It was only with work by Gass (1979), Kellerman (1979), Sjöholm (1976), and others that researchers began to question an inextricable link between behaviorism and the use of the native language. Research during the past two decades on the role of the native language has taken a different view, consistent with a nonbehaviorist position, and has questioned the assumption that language transfer has to be part of behaviorism. 3 This view of Skinnerian psychology is somewhat simplistic. It is clear that Skinner had a much more complex theory in mind. He argued that inner events are limited in their accessibility and do not have any special structure. Hence, they are not the same as the observable events that he was interested in. In his 1953 book, Skinner clearly acknowledges the gray area between public (observable) and private (internal): The line between public and private is not fixed. The boundary shifts with every discovery of a technique for making private events public. Behavior which is of such small magnitude that it is not ordinarily observed may be amplified. Covert verbal behavior may be detected in slight movements of the speech apparatus. … The problem of privacy may, therefore, eventually be solved by technical advances. But we are still faced with events which occur at the private level and which are important to the organism without instrumental amplification. How the organism reacts to these events will remain an important question, even though the events may some day be made accessible to everyone. (Skinner, 1953, p. 282) 4 This is the common name given to judgments one makes about the grammaticality of an utterance. Technically speaking, however, when one provides information about sentences or utterances, one is making an acceptability judgment. Grammaticality refers to what is generated by the grammar; acceptability judgments are judgments of well-formedness. Grammaticality, reflecting competence, is not directly accessible; it is inferred through judgments of acceptability. 5 This book is not the place to engage in a detailed commentary on the role of grammaticality judgments in SLA research. Suffice it to say that it is not a straightforward measure and it is not universally regarded as a valid or reliable measure. Some have argued that researchers who use grammaticality judgments do so under the assumption that they are directly tapping competence (Carroll & Meisel, 1990; Ellis, 1990, 1991). This belief is not well-founded. Researchers who have used this measure have typically acknowledged that competence in the linguistic sense is not directly accessible and that it can only be inferred from performance, with acceptability judgments themselves being a performance measure (Cook, 1990; Gass, 1994; White, 1989, see also Mackey & Gass, 2015).

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

Introduction to Introspective Methods

21

6 A relevant construct found particularly in the education literature is known as verbal overshadowing first described by Schooler and Engstier-Schooler (1990). This phenomenon refers to verbalizations of nonverbal stimuli and investigates how that verbalization impacts later identification of the stimulus. A special issue of Applied Cognitive Psychology (2002, volume 16, issue 8) is devoted to this particular type of verbalization. 7 The term second language studies is being used deliberately here. As Færch and Kasper (1987) noted “we have chosen to refer to the field of study as second language (SL) research, thus avoiding the bias towards developmental issues implicit in the more common term ‘second language acquisition research” (p. 5). (See also the diagram in Gass, 1998 representing the diversity of approaches in the field, and Seliger, 1983).

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

2 INTROSPECTION AND L2 RESEARCH

Introspective methods have become a common source of data elicitation in second and foreign language research (see Chapter 1). Such research has utilized verbal reporting, both on-line (e.g., talk-aloud or think-aloud) and retrospective. What is common about all these introspections is that the data come from learners’ comments about the way they organize and understand information. As mentioned earlier, stimulated recalls, a subset of introspective methods, are used to explore learners’ thought processes or strategies employed during a task by asking learners to reflect on their thoughts after they have carried out a task. They are differentiated from other introspective reports due to the requirement that they be carried out with some degree of support, namely, by providing learners with an audio- or video-recording of themselves carrying out a task (as in Mackey, Gass, & McDonough, 2000), giving them a picture they drew in response to L2 directives, giving them their observation field notes (Seung & Schallert, 2004) or even transcriptions of conversations (Schepens, Aelterman & Van Keer, 2007; Smith, 2012). While hearing or seeing these stimuli, learners are asked to recall their thought processes during the original event. In addition to cognitive psychology and educational research where many of these methodologies originated, stimulated recalls have been used widely in L2 research, for example in interlanguage pragmatics (see Cohen & Hosenfeld, 1981; Færch & Kasper, 1987; Kasper & Blum-Kulka, 1993), L2 reading and writing (De Silva & Graham, 2015; Lei, 2008; Ma, 2010; Nurmukhamedov & Kim, 2010; Uysal, 2008; Zhao, 2010), L2 writing strategy (De Silva & Graham, 2015), and currently, oral interaction (Fujii & Mackey, 2009; Gass & Lewis, 2007; Hawkins, 1985; Jourdenais, 1996; Mackey, Gass, & McDonough, 2000;

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

Introspection and L2 Research

23

Mackey, 2002; Mackey, Kanganas & Oliver, 2007; Sato, 2007; Watanabe, 2008; Watanabe & Swain, 2007). As we noted in our general discussion in Chapter 1, introspection has been used to gain information about what individuals are doing as they produce language. This is particularly important in the context of L2 research because it is often the case that the reasoning behind learners’ written or spoken behaviors is inferred by examining only production data. In L2 research, understanding the source of L2 production is problematic because often there are multiple explanations for production phenomena that can only be assessed by exploring the process phenomena. To take a concrete example, when Spanish speakers produce the English utterance, “no go,” with the intended meaning of something akin to “I am not going,” the question arises as to the source for the non-target utterance. Are learners producing this utterance because they are following a developmental path, or because they are constrained by the patterns of the native language, or perhaps both (see Schumann, 1978; Zobl, 1980)? Most learners would be unlikely to respond that they were following specific developmental sequences.1 A predicted response to the question of what the learner was thinking about when he produced this utterance might be “I was thinking that I didn’t know how to say this in English, so I said it the way it is said in Spanish.”2 A response such as this provides some indication that, at least in part, knowledge and use of the native language was one source for the production. In Figure 2.1, we present a useful model, discussed by Henderson, Henderson, Grant, and Huang (2010), which is an adaptation of a similar model presented by Pausawasdi (2001) and Henderson and Tallman (2006). The model shows the relationship between information processing theory, mediating processes, and introspection, drawing on work by Craik and Lockhart (1972) and Rumelhart and McClelland (1986). The first part of the model considers the memory part of using any sort of introspection methodology and places both long-term and short-term memory as the basis of what is necessary for stimulated recall to work. As Henderson et al. (2010) explain, the original stimuli “are sent to sensory memory where [they are] forgotten, discarded, or given attention and forwarded to the short-term working memory for processing” (p. 7). To give an example of an L2 research project, we might consider a chat setting involving feedback. A camera is focused on the participant and information on keystrokes is also obtained. The sensory information is processed by one’s eyes and fingers, but that sensory information is readily forgotten. One cannot easily recall those sensory stimuli unless there is specific attention addressed to that part of the stimulus. Henderson and Tallman (2006) cite research that suggests that there is a 15–60-second memory trace before sensory input is forgotten, although it can be increased to as much as 20 minutes with rehearsal (Goldstein, 1999). Huitt (2003) makes this clear, “It will initially last somewhere around 15 to 20 seconds unless it is repeated (called maintenance rehearsal) at which point it may be available for

24

Introspection and L2 Research

Memory system

attention

Stimuli e.g, Virtual world stimuli such as the Chinese restaurant, language prompts, and other avators; Real world stimuli such as the computer lab, lecturer, and table group

Short-term or working memory

forgotten or discarded rehearsal

Outcomes e.g., ordering specific dishes in Mandarin at the SL Chinese restaurant

Mediating process paradigm

Mediating processes accessed through Introspective Process Tracing Tool Digital video of SL lesson (both on-screen activity and real-life activity)

Stimulated Recall Interview

Two prompts Artifact prompt

triggers triggers

Interviewer questions

Introspection methodology

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

retrieval

encoding

Sensory memory

Information processing theory

Long-term memory

forgotten or discarded

Participants’ Recalled Thoughts

Diagrammatic model of theoretical and methodological framework for stimulated recall (from Henderson et al., 2010; adapted from Henderson & Tallman, 2006; Pausawasdi, 2001)

FIGURE 2.1

up to 20 minutes” (para. 16). The process continues with further categorization and storing of stimuli (sometimes through chunking and rehearsal, as when we attempt to recall a phone number provided to us verbally without the ability to write it down). Mediation allows individuals to identify their thoughts and/ or strategies while they were performing a task. The underlying basis of using introspection (and stimulated recall, in particular) is that individuals can access those mediating processes (i.e., state what they were thinking). In the case of stimulated recall, this comes about through prompts.

Introspection and L2 Research

25

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

What Topics Can Be Explored Using Stimulated Recall Methodology? One of the main uses of introspective methodologies in general has been to seek to uncover cognitive processes that are not evident through simple observation. Although full descriptions of the debates surrounding mental processes and introspective verbal reports are beyond the scope of this book on methodology, we provide a brief overview of issues specifically related to stimulated recall, an elicitation technique that generally appeals to cognitive psychologists and researchers who are interested in information processing as well as those who are interested in how L2s are learned and taught. It has been adopted as a more practical way of training those in fields where professional success depends on good elicitation of information (e.g., medical practice, and in particular, for investigations of physicians’ clinical reasoning). In fact, Barrows (2000) notes that “stimulated recall has been used to analyze and evaluate the clinical reasoning of medical students, residents and physicians …” (p. v). He further notes that the technique helps those in training “relive the patient encounter while watching the tape.” We focus in this chapter on issues of cognition (including teacher cognition), the main underpinnings of this research technique. In L2 research, the focus is on how language-specific knowledge is acquired, organized, and used. Stimulated recall provides a useful tool that helps uncover cognitive processes which might not be evident through simple observation. More specifically, stimulated recall can be useful for at least four reasons: 1

2

3

4

It can help to isolate particular “events” from the stream of consciousness. In so doing, it can help to identify the type of knowledge a learner uses when trying to solve particular communicative problems, when making linguistic choices or judgments, or when generally involved in comprehension and/ or production. Stimulated recall can help to determine if this knowledge is being organized in specific ways. Cognitive psychologists have proposed that we employ various types of “cognitive structures” or “mental representations” to help organize the vast amount of information encountered on a daily basis. Some of these structures may be fairly long-lasting, such as the way we organize our mental lexicon, others may be more dynamic and short-lived, such as the structures built during aural comprehension. Stimulated recall can be used to help determine when and if particular cognitive processes, such as search, retrieval, or decision making are being employed and what strategies learners might be using at a given point in time. In teacher education programs, stimulated recall is useful in helping teachers understand why they employ certain pedagogical strategies over others in the classroom.

26

Introspection and L2 Research

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

Knowledge Types An important distinction that is often made when discussing human information processing is that between declarative knowledge and procedural knowledge. In the context of L2 learning, declarative knowledge is thought to be comprised of rule knowledge at all linguistic levels, organized in analyzed form. Declarative knowledge is thought to be directly accessible through introspection and thus particularly appropriate for study using stimulated recall. To use or activate declarative knowledge, however, and to extend it through language learning, a second type of knowledge, procedural knowledge, is claimed to exist. Procedural knowledge is comprised of the cognitive and interactional processes involved in the reception, production, and acquisition of language. Unlike declarative knowledge, procedural knowledge is considered automatic and inaccessible via introspection (Færch & Kasper, 1987). However, breakdowns in automatic processing, such as when learners do not understand something due to lack of declarative knowledge, may lead to mental states in which some forms of procedural knowledge do become available to introspective report.

Knowledge Structures Stimulated recall can help to determine if declarative or procedural knowledge is being organized in specific ways. Two examples that are relevant for L2 research are plans and scripts (Schank & Abelson, 1977). Plans are thought to be mental structures that we build during conscious and deliberate planning or decision making. Such plans have the flavor of declarative knowledge and can be easily self-monitored and reported on using stimulated recall. Scripts, on the other hand, are thought to provide the fundamental guidelines we need for much of the routinized or automatic components of our behavior. As discussed above, scripts are clearly related to procedural knowledge, and thus may be harder to explore with introspective techniques. However, it may still be possible to gain some insight into the operation of scripts using stimulated recall methodology. Some of Nisbett and Wilson’s (1977a) explorations have shown that rather than relying on their actual memories of events for interpretations of their own behaviors, in some circumstances people will rely on their expectations, or scripts, to illustrate what happened. The less recent an event, the more likely expectations rather than memory will be used for interpretation. Ericsson and Simon (1996) provided the following example: “if a picture reminds one of an old friend, it may be tempting to use the stored information about that friend to infer what the person in the picture looked like” (p. 19, emphasis in the original). To provide a language-related example, we might consider the language learner who receives feedback on the grammaticality of her or his utterance during oral interaction. Because this learner is used to receiving feedback related to meaning

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

Introspection and L2 Research

27

or message comprehension in oral discussions with native speakers (NS) and not to being grammatically corrected during spontaneous conversation, the learner’s script may prompt him or her to report that the feedback provided was related to comprehension, when in fact it was related to grammar. Stimulated recall techniques may also assist the researcher to gain access to the cognitive processing of scripts (Calderhead, 1981a, 1981b). Reder (1982), in a study of whether participants used memory retrieval of plausibility judgments to decide whether ten sample sentences were from a story they had read, found that when memory traces are fresh, retrieval of the exact memory works faster and is easier for participants than considering the plausibility of the sentences. As memory fades, plausibility judgments are easier to make than direct recall of memory. Again, stimulated recall that is carried out immediately after the event and uses a strong stimulus allows for the greatest likelihood of reliable retrieval (cf. Ericsson and Simon, 1980).

Cognitive Processes and Learner Strategies Cognitive processes refer to 1) search and storage mechanisms, 2) inferential mechanisms or 3) retrieval processes. Such processes are generally thought to operate at an unconscious level. For example, when a person is trying to remember an acquaintance’s name, they usually engage in specific cognitive processes, such as trying to remember what the person looks like, the last time they saw him/her, the first letter of their name or the number of syllables in their name. In studies of lexical retrieval, researchers have consistently found that these are typical and common steps (Aichison, 1994). However, these processes are usually unconscious unless one is asked to describe exactly the steps during the process of trying to recall the name. Cognitive processes are highly relevant in the field of L2 studies, where one needs to investigate the steps learners go through as they search and retrieve lexical items and morphosyntax. Currently, there is a great deal of theoretical debate about unconscious or implicit learning, the roles of perception, noticing, and attention in L2 learning. It is probable that stimulated recall procedures will provide useful data in the ongoing explorations of these topics. Another cognitively oriented aspect of learning that stimulated recall has been used to explore is that of learners’ strategies. In a study that addressed questions about advanced learners, Lennon (1989) explored learners’ strategies through introspective methods. He found that some learners demonstrated an orientation towards uncertainty, although they were focused on communication rather than on “correctness.” They reported that their language was very much influenced by that of their interlocutor. As Lennon notes, investigation of learners’ beliefs, attitudes, and perceptions about language learning are often explored through stimulated recall. Although strategies are not the focus of this book, we refer the

28

Introspection and L2 Research

reader to Cohen (1998, 2007) who provides a comprehensive treatment of the topic and to Plonsky’s (2011) meta-analysis on strategy instruction.

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

Teacher Cognition Borg (2003) defines teacher cognition as “the unobservable cognitive dimension of teaching—what teachers know, believe, and think” (p. 81). He further lists the main questions addressed in this area of research as the following: 1) What do teachers have cognitions about? 2) How do these cognitions develop? 3) How do they interact with teacher learning? and 4) How do they interact with classroom practice? Viewing teaching and teacher education in this way allows researchers to understand how teachers become the teachers that they are and how they continue to grow. As Basturkmen, Loewen, and Ellis (2004) point out, this leads us to see “teaching as a thinking activity and teachers as people who construct their own personal and workable theories of teaching” (p. 244). Using stimulated recall can help in understanding what teachers are thinking and how that thinking changes according to specific classroom situations and/or over time.

Range and Use of Stimulated Recall in L2 Studies Since the publication of the first edition of this book (2000), stimulated recall has become common, and, in fact has become a standard elicitation measure in L2 studies. In the first edition, we considered the preceding quarter-century and found only 12 studies using stimulated recall for elicitation (see Gass & Mackey, 2000, p. 29, Table 2.2). Based on a similar survey going back to 20003, we have found approximately 125 language-related studies that used stimulated recall and numerous others which reference the original book but which use other forms of introspection or are opinion pieces. As further indication of the degree to which stimulated recall has been accepted, many studies use stimulated recall without citing the first edition or without providing justification for the use of the technique. This puts stimulated recall on par with most other elicitation measures where neither the originator nor a justification of its usefulness and validity are mentioned. This was not the case a decade and a half ago, when we wrote the first edition.

Descriptions of L2 Studies Utilizing Stimulated Recall Stimulated recall methodology has been used to address a wide range of topics in L2 research. These topics include cognitive processes in general and specifically L2 strategy or inferencing use, L2 teachers’ decisions, L2 writing choices and processes, L2 reading and lexical use, and L2 oral interaction amongst other areas. In Table 2.1 we have listed the topics of the journal articles we surveyed between

Introspection and L2 Research

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

TABLE 2.1

29

SLA studies using stimulated recall from 2000–2015

Topic (# of articles)

Time between event and SR

Type of stimulus

Attention/awareness/ noticing (22)

Many do not report

Many do not report what the stimulus is

Strategy use (21) Teacher cognition (16) Interaction (15) Reading/writing (12) Language teaching methods (10)

Many are vague (2–3 days; as soon as practical; within a few days) Most are immediate (including up to 3 hours after the event) approximately 50

Some used a transcript Most used audio or video (video is more common) Key strokes Audio alone

Some go up to a month following the event

Video alone

CALL (9) Testing (6)

More than 30 are in the 1 day to 2 week range

Audio and video

Motivation (5) Sentence processing (1) Other (8)

2000–2015. In addition to the topics, we include information about the time lag between the event being investigated and the stimulated recall and the types of stimuli. During this fifteen-year time span, the languages of use have included English (the large majority), Spanish, Japanese, Arabic, Korean, and Chinese. In this section we describe 14 studies that illustrate a variety of uses to which stimulated recall has been put in the field. We include empirical studies that include differences in topics as well as, where possible, language of the recall.

Processing The first study we describe appeared in 2012 and was written by Tode. Her study investigated the processing of reduced relative clauses by Japanese learners of English as a foreign language. Twenty-eight participants provided data on a selfpaced reading task and 25 of them participated in the stimulated recall part of the study. In the self-paced reading task, a phrase appeared, with the participant pressing a button when ready for the next phrase (the preceding phrases remained on the screen). Along with the final phrase appeared the following

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

30

Introspection and L2 Research

choices: 1) I understand and 2) I don’t understand. The final step was to verbally translate the sentence into Japanese or they could say, “I have no idea,” when they didn’t understand the sentence. A stimulated recall session was held with each participant using the video of his or her session as the stimulus. The goal was to investigate their thought processes during sentence processing. In analyzing the data, the comments from the stimulated recall sessions were transcribed, and the thought processes that were revealed were categorized into five categories. Stimulated recall suggested automatic processing through comments indicating that they had understood a sentence without effort. Through an analysis of the types of comments that participants made, Tode was able to differentiate the successful and unsuccessful processors. The successful group “computed syntactic relationships in real time, whether consciously or unconsciously” (p. 183). There was no evidence to show that the unsuccessful ones did a real-time analysis. In general, the stimulated recall results were used to bolster the reading time results, although there were inconsistencies as well as congruencies. Summary of Major Features of Tode (2012) Theoretical orientation:

Usage based

Topic:

Processing of reduced relative clauses (I like shoes made in Italy.)

Native language:

Japanese

Target language:

English

Level:

First year college

Stimulus:

Video

Other elicitation measure:

Self-paced reading

Language of stimulated recall: Japanese Time lag:

Immediate

Strategy use The second study we highlight deals with testing and, in particular, the strategic behaviors of Chinese ESL learners on two different types of oral language tests: independent and integrated. Barkaoui, Brooks, Swain, and Lapkin (2013) used iBT TOEFL tasks as the basis for understanding what strategies are used in order to understand the extent to which the actual test tasks “engage the linguistic knowledge, processes, and strategies intended by test developers” (p. 308). Thirty Chinese non-native speakers (NNS) of English took the iBT

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

Introspection and L2 Research

31

TOEFL, which consists of six tasks, two independent and four integrated. The session was videotaped. Following each task, participants watched a videotape of him/herself and engaged in a stimulated recall session. Their thoughts could be expressed in either Chinese or English, as each person preferred. Their results revealed novel information about test taking. Among the novel findings are that strategy use is an important part of performance on speaking tests. A second finding is that the different task types (integrated versus independent) generate different strategies. And following this difference is the related finding that some strategies may be more effective with one task type, but not another. Third, because integrated tasks generate a broader and more diverse range of strategies, it is likely that they more accurately reflect the complexities of academic spoken discourse. The final conclusion we point out is that because different strategies are used in the two types of tasks, one might question whether the same construct is being assessed. Should then the scores on the two task types be combined into one score? All of these issues were revealed through the use of stimulated recall and could not have been determined by test results alone. Summary of Major Features of Barkaoui, Brooks, Swain, & Lapkin (2013) Topic:

Strategies in test taking

Native language:

Chinese

Target language:

English

Level:

Undergraduate and graduate students studying engineering with scores between 1.33 and 3.75 (out of 4) on the speaking part of iBT TOEFL

Stimulus:

Video

Language of stimulated recall:

English or Chinese depending on participants’ comfort level (instructions given in Chinese and English)

Time lag:

Immediate

Other data sources:

None—sole source of data

Motivation In a study on motivation, Kang (2005), investigated learners’ willingness to communicate (WTC), which Kang defines at the outset of the paper as “the tendency of an individual to initiate communication when free to do so” (p. 279). Numerous variables (e.g., anxiety, sex, age, attitudes, and prior experience) have been found to predict one’s willingness to communicate in

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

32

Introspection and L2 Research

particular situations. This particular study asked the following questions: 1) What situational variables affect WTC in the L2 in a communication situation? 2) How do the situational variables construct situational WTC in the L2?, and 3) How does the situational WTC in the L2 change over the course of communication? The participants were four male Korean students studying English in the U.S. who were participating in a conversation partner program in which they met with a native speaker of English for unstructured conversations. These sessions were audio- and video-recorded with the video recordings being the stimulus for the stimulated recall sessions with the researcher. An analysis of the transcripts yielded numerous factors (e.g., feelings of security, excitement, and responsibility and the influence of interlocutors and topic on these feelings) that contribute to the dynamic emergence and fluctuation of learners’ WTC. Summary of Major Features of Kang (2005) Topic:

Willingness to communicate

Native language:

Korean

Target language:

English

Level:

ESL students (4): low intermediate, high intermediate, advanced

Stimulus:

Video

Language of stimulated recall:

Korean

Time lag:

Not specified

Other data sources:

None—sole source of data

Responses in oral interaction The purpose of Hawkins’ (1985) study was to determine whether replies in NNS discourse, which on the surface appeared to be appropriate conversational responses, did in fact represent comprehension of what had preceded in the discourse. Two dyads of adult participants, (a NS of Spanish and a NS of English), carried out four communicative tasks in English. The tasks were designed so that both the NNSs and the NSs would be in possession of information needed by the partner to complete the task. One task was the popular English as a second language (ESL) “grab bag” game, which consisted of one participant removing and then describing objects from a bag containing common objects (e.g., a plastic knife, a key, a piece of chewing gum). The other participant asked questions about the object in an effort to guess what the object was. The interactions were

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

Introspection and L2 Research

33

tape-recorded. The tapes of the task-based interaction were played back to the participants, who were asked to “stop the recorder at any time and comment on what you were thinking at that point in the conversation” (Hawkins, 1985, p. 165). Additionally, Hawkins stated, “The investigator also felt free to stop the recorder and ask questions of the subjects if the subjects themselves did not stop the recorder” (p. 165). The stimulated recall was conducted in each participant’s native language (Spanish or English). Hawkins found that what on the surface appeared to be appropriate responses were not. These so-called appropriate responses were given even when comprehension had not taken place. Summary of Major Features of Hawkins (1985) Topic:

Oral Communication

Native language:

Spanish and English

Target language:

English

Level:

ESL students

Stimulus:

Audio

Language of stimulated recall:

Native language of participant (Spanish or English)

Time lag:

Ranged from immediate to two days

Other data sources:

None—sole source of data

Perceptions about feedback Mackey et al. (2000) conducted a study of oral language use to investigate the accuracy of learners’ perceptions of NS feedback. The database consisted of two groups: learners of ESL and learners of Italian as a foreign language. Each participant carried out a spot-the-difference task in which the NS and the learner both had similar, but not identical pictures. The two interlocutors described the pictures to each other in order to identify the areas of difference. This session lasted for approximately 15–20 minutes and was videotaped. During the interaction, the English- and Italian-speaking researchers provided different types of feedback when the participants produced a non-target-like utterance. The stimulated recall sessions were conducted immediately after completion of the task-based activity. Participants were told that they could pause the tape at any time if they wished to describe their thoughts at any particular point in the interaction. The researcher also paused the tape after episodes where feedback was provided and asked the learner to recall his or her thoughts at the time when

34

Introspection and L2 Research

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

the original interaction was going on. These recall sessions were audio-taped. Analysis of the stimulated recall sessions suggested differences between syntax, morphology, vocabulary, and pronunciation in the way feedback is understood. Summary of Major Features of Mackey et al. (2000) Topic:

Perceptions of feedback

Native language:

English, Cantonese, French, Japanese, Korean Thai

Target language:

Italian; English

Level:

Beginner–intermediate

Stimulus:

Video

Language of stimulated recall:

English (both groups)

Time lag:

Immediate

Other data sources:

None—sole source of data

Strategy use Khan and Victori (2011) investigated strategy use by English language learners in Spain. The first stage consisted of a strategy questionnaire to determine perceived strategy use on three tasks. Next, participants completed three tasks (picture story, art description, and information gap). After each of the tasks, which were conducted two weeks apart, a strategy questionnaire was completed where participants were asked to agree or disagree with statements such as “I spent time thinking about what I was going to say.” Four participants were videotaped as they carried out the task and filled in the questionnaire. A stimulated recall immediately followed each task-questionnaire sequence. Data were triangulated with comparisons being made between data from the questionnaires, data from actual task performance, and data from the stimulated recall sessions. There were both consistencies and discrepancies amongst the three data elicitation types. Summary of Major Features of Khan & Victori (2011) Topic:

Strategy use

Native language:

Spanish

Target language:

English

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

Introspection and L2 Research

Level:

Intermediate

Stimulus:

Video

Language of stimulated recall:

L1

Time lag:

Immediate

Other data sources:

Performance on tasks

35

Pragmatics and reading Uysal (2012) investigated the role of cross-cultural pragmatics on the reading of a text. In particular, the author was interested in whether participants from different linguistic and cultural backgrounds react differently to texts. There were two research questions: 1) Are there any differences between Turkish and American students’ markings while reading a text written by a Turkish editorial columnist? and 2) What are the Turkish and American students’ articulated reasons for their markings in the selected text? Four NSs of Turkish and four NSs of English read a pre-selected text with not “too much cultural, literary, and figurative language” (pp. 15–16). The NSs of Turkish read the text in the original and the NSs of English read a translated text. All participants were asked to indicate the places where the text was: 1) difficult to understand, 2) different from what they were used to seeing in an editorial column, and 3) effective. The stimulated recall was conducted within two days of reading the text, and the marked-up text was used as the stimulus. From the original data, both groups found the text equally effective, but the English NSs found the text more difficult to understand and different from what they were used to. Considering the stimulated recall data allowed the researcher to better understand the role of cultural perceptions when reading texts. Summary of Major Features of Uysal (2012) Topic:

Cross-cultural influences on reading

Native language:

English; Turkish

Target language:

NA

Level:

NA

Stimulus:

Text with markings

Language of stimulated recall:

Not stated

Time lag:

Within 2 days

Other data sources:

Textual analysis (underling parts of texts that were difficult, different, effective)

36

Introspection and L2 Research

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

Composing process Bosher (1998) investigated the composing process of southeast Asian students with a focus on differences between students who had graduated from a high school in the United States and those who had completed high school in their home country. Participants were asked to read and respond to an opinion article that had appeared in a local paper regarding the impact of a new requirement for high school students to pass competency tests in order to graduate. During the hour of writing, cameras were focused on the paper so that the movements of pen and paper were visible. While students were writing, they were observed on a monitor situated in another room. Time spent pausing and instances of referring back to the original reading were noted. Immediately following the writing session, the students were interviewed, with pause times of several seconds used as stimuli for probing the students’ thought processes while writing. The stimulated recall protocols revealed individual differences in strategy use, even among participants with similar proficiency scores. The consistency of findings across data sources offers support for the reliability of stimulated recall. Summary of Major Features of Bosher (1998) Topic:

Composing process

Native language:

Laotian, Vietnamese, Cambodian

Target language:

English

Level:

High school graduates: two in US, one in home country; Michigan English Language Assessment Battery (MELAB) scores 67–70

Stimulus:

Video of composing behavior

Language of stimulated recall:

Not stated

Time lag:

Immediate

Other data sources:

Interviews, pausing behavior, textual analysis

Computer-mediated Communication (CMC) Uzum (2010) investigated to what extent learners in a CMC environment adapt to the context and synchronize their language and style to that of their partners. L2 learners were paired with NSs in task-based chat sessions. The session consisted of three tasks and lasted approximately 30 minutes. During the stimulated recall carried out one week later, learners were presented with the transcript of chat sessions with particular sentences highlighted. Learners then

Introspection and L2 Research

37

answered questions about highlighted sentences on their transcript. Learners did show evidence of alignment to their interlocutor and to the context.

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

Summary of Major Features of Uzum (2010) Topic:

Linguistic adaptation in a CMC environment

Native language:

Chinese, Japanese, Arabic, Korean, Turkish, Uzbek

Target language:

English

Level:

Intermediate

Stimulus:

Transcripts of chat sessions with particular sentences highlighted

Language of stimulated recall:

The recall was done not with a researcher, but by answering questions about highlighted sentences on their transcript

Time lag:

One week

Other data sources:

Chat transcripts

Attention/Noticing/Awareness A study by Yoshida (2008) investigated pair work in a Japanese language program. Participants participated in a range of activities having a specific grammatical or lexical focus. The research questions were: 1) Do learners’ responses after their partners’ corrective feedback during pair work indicate that they have noticed it? and 2) What factors, including affective factors, influence learners’ perceptions of corrective feedback during pair work? The conversations were audio-recorded and the recordings served as stimuli for the recall data conducted one week later. The stimulated recall data showed that learners noticed peer correction although they didn’t always understand it, even in those instances when their responses indicated understanding. Summary of Major Features of Yoshida (2008) Topic:

Perceptions of corrective feedback

Native language:

English

Target language:

Japanese

Level:

Second year

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

38

Introspection and L2 Research

Stimulus:

Audio recordings of pair-work

Language of stimulated recall:

English

Time lag:

Within one week

Other data sources:

Observation notes, audio recordings

Teacher cognition De La Campa and Nassaji (2009) explored the use of the L1 in German classrooms and the reasons for its use. Video- and audio-taped samples were collected from two German teachers as they taught second-year university German courses. Data consisted of the classroom recordings themselves along with interviews conducted to “explore their general beliefs and attitudes toward using L1 as well as the general purposes for which they use L1 in their classrooms” (p. 745). Video samples of twenty instances of L1 use served as stimuli. The instructors were able to provide clear reasons for using the L1 strategically in the classroom. Summary of Major Features of de La Campa & Nassaji (2009) Topic:

Use of L1 in the classroom

Native language:

Not applicable

Target language:

German

Level:

Second year

Stimulus:

Video recordings

Language of stimulated recall:

German

Time lag:

Immediate

Other data sources:

Recordings of L2 classes, interviews

Testing A study by Isaacs and Thomson (2013) used stimulated recall as a way of understanding the impact of rater experience and the length of the rating scale on actual ratings. Speech samples were collected from 38 ESL learners, and both experienced and novice raters used 5-point and 9-point rating scales to make judgments of comprehensibility, accentedness, and fluency. Immediately after rating each speech sample, the raters verbalized their thoughts about the sample they had rated. The quantitative data did not reveal significant differences between the two scales, nor were there significant differences between the two groups of

Introspection and L2 Research

39

raters. However, the commentaries of the two rater groups showed that their ratings were impacted by their experiences (or lack thereof) with accented speech.

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

Summary of Major Features of Isaacs and Thomson (2013) Topic:

Effect of rating scale length and rater experience on ratings

Native language:

Chinese and Slavic languages (Russian, Ukrainian, Serbian, Polish)

Target language:

English

Level:

Beginning

Stimulus:

Audio recordings of speech samples

Language of stimulated recall:

English

Time lag:

Immediate

Other data sources:

Ratings, interviews

Communication problems Dörnyei and Kormos (1998) investigated speakers’ management of problems in L2 communication. The main purpose of their article was to understand L2 communication in light of Levelt’s (1989, 1993, 1995) model of speech production. They illustrated the various mechanisms proposed by Levelt through examples and retrospective comments taken from L2 learner data. Forty-four Hungarian learners of English were asked to perform three communicative tasks. The audio recording of the tasks constituted the stimulus. Perhaps because of the conceptual goal of the study, few details were supplied regarding the mechanics of the retrospection other than the fact that participants were asked to listen to the recordings of their own interactions and to answer questions and make comments on the difficulties they experienced. According to the researchers’ analysis, more than 450 manifestations of problem management were discovered or confirmed through stimulated recall. The authors used Levelt’s model of speech processing to explain the psycholinguistic mechanisms involved in conversational repair; stimulated recall methodology was used to supply the empirical evidence. Summary of Major Features of Dörnyei and Kormos (1998) Topic:

Communication difficulties

Native language:

Hungarian

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

40

Introspection and L2 Research

Target language:

English

Level:

Intermediate to post-intermediate

Stimulus:

Audio

Language of stimulated recall:

Not stated

Time lag:

Not stated

Other data sources:

Conversation repairs from tasks

Verification of research methodology A study by Gass (1994) illustrates stimulated recall in an investigation of acceptability judgments. The main purpose of this study was to ascertain the extent to which acceptability judgments constitute a reliable instrument for gathering L2 data. There were two parts to the study. In the first part, NNSs of English were asked to give acceptability judgments for 30 English sentences, including 24 sentences with relative clauses, the target of investigation. Participants were first asked to judge each sentence categorically as to whether they felt it was grammatical or ungrammatical and then were asked to assess the degree of confidence they had in their judgment. The test (with different randomizations) was given to each participant twice with a one-week interval. Answer sheets were coded to determine the amount of consistency between Times 1 and 2. Stimulated recall was used with a subset (4) of the participants to uncover reasons why they might have had radically different judgments at the two different times. The answer sheets of the participants formed the stimulus for the recall. Findings suggested reliability of acceptability judgments and suggested ways where inconsistencies could be predicted by linguistic indeterminacy as well as by external factors. Summary of Major Features of Gass (1994) Topic:

Methodology: acceptability judgments

Native language:

Chinese, Japanese, Korean

Target language

English

Level:

Intermediate

Stimulus:

Paper responses to acceptability judgments at an interval of one week

Language of stimulated recall:

English

Time lag:

Not stated

Other data sources:

Acceptability judgments

Introspection and L2 Research

41

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

Conclusion This chapter and the preceding one have presented some of the arguments concerning the cognitive processes stimulated recall may uncover. This chapter also provided evidence that stimulated recall, although generating some interesting criticisms, is, in general, a methodology that has been subjected to both empirical testing and theoretical review (see Chapters 3 and 4 for a discussion of validity and reliability). The following chapter provides greater detail on how to include stimulated recall in L2 research.

Notes 1 See, however, introspective diaries by SLA researchers who are able to reflect on their own developmental progressions (Schmidt & Frota, 1986; Schumann & Schumann, 1977). 2 The astute reader will realize that our hypothetical learner could never have produced such a complex sentence in English if he could not say, “I’m not going.” One must assume that this hypothetical recall was done in Spanish. The language of the recall is further discussed in Chapter 4. 3 The survey is limited and undoubtedly underestimates the use of stimulated recall as an elicitation tool. In particular, absent from the survey are book chapters and dissertations as well as other sources that were inaccessible to us (e.g., written in a language other than English).

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

3 CHARACTERIZATION OF STIMULATED RECALL

In this chapter we describe the stimulated recall procedure in greater detail than what was presented in Chapter 2. In the first part of the chapter, we discuss preliminaries of a research project using stimulated recall. In particular, we look at reasons for using this particular elicitation technique, including the research questions that might be part of that study. In the second part, we hone in on the design features of a study that incorporates stimulated recall.

Research Questions A research question is a statement about what you will be investigating. As such, it serves as a guide and constraint of your investigation. For quantitative research, a research question typically includes who (participants), what (the effects), and how (the means). In Mackey and Gass (2015) we point out that research questions should address current issues and should be “sufficiently constrained so that they can be answered” (p. 19). Questions that are too broad (e.g., “what is the effect of the first language on the second?” or “what are L2 learners thinking when they process sentences?”) cannot be practically addressed and must be narrowed down before a research design can be formulated. In qualitative research, on the other hand, the data themselves often serve as the basis for the information presented in a report. In other words, there is often no question at the outset. In looking through the general SLA literature, we find that there have been numerous studies that could have benefited from obtaining stimulated recall data. These will be discussed in greater detail in Chapter 5. An example of this sort of study is by Munro (1998) where the original research question was concerned with the effect of background noise on the perceptions of raters of

Characterization of Stimulated Recall

43

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

English spoken by native speakers (NSs) and by speakers with Chinese-accented speech. Munro included digitally-mixed cafeteria background noise for half of the speech samples to be judged. He concluded that noise was a greater factor for comprehensibility than was accented-speech. At the end of the article, he stated: It is … noteworthy that individual listeners were apparently affected differently by the presence of noise, with some showing much greater difficulty understanding noisy utterances than others … it is beyond the scope of this study to explain why some of the voices were affected more by noise than others … It is possible that differences in voice quality … and differences in prosodic goodness may have played a role. For instance, the use of a certain vocal pitch or a nonnative rhythmic pattern may make it difficult for listeners to track a voice under adverse listening conditions. (Munro, 1998, p. 151) This suggestion shows the article to be a good candidate for stimulated recall. If we follow this suggestion, possible research questions might be: Is individual voice quality affected by background noise as determined by rater reflections? Is individual vocal pitch affected by background noise as determined by rater reflections? Is individual nonnative rhythmic pattern affected by background noise as determined by rater reflections? Another example of research questions emanating from existing studies comes from Saito (2013) who conducted a study on the value of recasts as they relate to perception and production of English /‫݋‬/ to Japanese learners. A research question that could be asked to give greater depth to our understanding is the following: Do Japanese learners of English notice teacher feedback on production of English /‫?݋‬ Research questions in stimulated recall research have been diverse, covering a wide range of topics. Below is a small sampling of research questions from studies that have used stimulated recall. In many instances, these questions are one of a set of questions. 1

2 3 4 5

To what extent and how do test takers’ reported strategic behaviors vary across integrated and independent tasks? across tasks within task type? (Barkaoui et al., 2013) What do the participants believe constitutes “good” feedback, and to what extent does their practice converge with their beliefs? (Li & Barnard, 2012) What writing strategies do proficient Chinese EFL learners use in writing activities? (Lei, 2008) Is there evidence that experienced and novice raters arrive at their ratings in different ways? (Isaacs & Thomson, 2013) Do output activities promote noticing of linguistic form in subsequent input? (Uggen, 2012)

44

Characterization of Stimulated Recall

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

Designing a Study Using Stimulated Recall As discussed elsewhere in this book, stimulated recalls are used primarily in an attempt to explore learners’ thought processes and/or strategies, by asking learners to reflect on their thoughts after they have carried out a pre-determined activity. The elicitation technique is carried out with some degree of support, for example, by showing a videotape to learners so that they can watch themselves carrying out an activity while they vocalize the thought processes they were having at the time of the original activity. In Chapter 2, we presented a sampling of stimulated recall studies that suggested a wide range of possible uses. Here we turn to some preliminary decisions that need to be made before delving into the logistics.

The Stimulus Once research questions have been established, the next step is to determine what the task is, and what the stimulus will be. The stimulus is a key part of any methodology in which the data collected rely on participants recalling a previous event. In this section, we discuss stimulus types and the decisions that have to be made in selecting what to use. The purpose of the stimulus is to activate or refresh recollection of cognitive processes so that they can be accurately recalled and verbalized. The stimulus results from a task that the participant was involved in and is essentially a record of that event. This can be an audio or a video of the event or a written product (particularly if it contains a record of corrections and/or modifications). Other possibilities are records of field notes and even a transcription of a conversation or computer-captured data. Each of these differs in how much of the event can be recalled. For example, a video recording may be more effective in eliciting accurate recall than an audiotape, in part because much is lost in an audio-only recording (e.g., facial expressions, body movements). Other support may be deemed weak because it may be too far from the event (see the following section), as is the case with field notes. Still other support may be too strong because the stimulus itself is a distractor. For example, a video of an eye-tracking experiment is not a good stimulus, because a video of this sort requires students to follow eye movements. This could be thought of as an overly strong stimulus because participants may experience too much “noise” thereby obscuring recall. It is important to be aware that not all participants respond in the same way to a given stimulus, and one must be cautious about claims for strong levels of support. In Figure 3.1, Henderson and Tallman (2006, p. 77) provide a useful schematic of the structure of the interview and the relationship of the prompts (i.e., stimuli) to thoughts to output (expression of thoughts). The information that will serve as data comes from two sources: the actual interview questions and the stimulus. Both the question and the stimulus provide the impetus for

Characterization of Stimulated Recall

45

Stimulated Recall Interviews utilize

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

Two types of prompt

Interviewer questions

based on artifact (video) triggers

triggers

Artifact (video) triggers

Interviewee thoughts

Two types

Recall • “there and then” • introspective • justificatory • occurred during actual event

Hindsight report • “here and now” • reflective • explanatory • occurred during interview

Types of prompts and thoughts accessed in stimulated recall (from Henderson & Tallman, 2006, reprinted with permission)

FIGURE 3.1

recall of original thoughts, which are of two types. Only the recall thoughts are the ones that are useful in determining actual thought processes. We return to this issue below in the discussion of the types of questions to ask.

Considerations when considering stimuli Should be strong: Example—video Should not be distracting: Example—visual display of eye-tracking

Time Lapse Between Event and Recall In addition to the issue of stimulus type, the length of the time that elapses between the event and the recall is key. In Table 3.1, we present three types of recall in terms of time delays. With immediate retrospection, where there is little or no gap, many researchers have argued that structures in short-term memory are still available for access (Ericsson & Simon, 1996). As they noted:

46

Characterization of Stimulated Recall

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

TABLE 3.1

Three types of stimulated recall

Consecutive recall

Example: L2 Writing. Immediately after finishing revisions on an essay draft, participants are interviewed about the changes they made, using the initial and final written products as stimuli.

Delayed recall

Example: L2 Reading. After reading a passage in the L2, participants are given a list of questions about their comprehension of the passage. After the straight comprehension questions, they are asked to write about particular difficulties they may have had with the passage, and how they overcame them. They are asked to take the questions home and bring their answers in the next day.

Nonrecent recall

Example: L2 Strategies. After taking a placement test in the middle of the instructional year, one class of participants is divided into groups of successful and less successful students. These students are given email accounts, and are asked to send at least one message a week to a researcher, speculating on the ways in which they are learning vocabulary during the current semester, as opposed to the previous semester.

In the ideal case the retrospective report is given by the subject immediately after the task is completed while much information is still in STM [shortterm memory] and can be directly reported or used as retrieval cues. It is clear that some additional cognitive processing is required to ascertain that the particular memory structures of interest are heeded. Our model predicts that retrospective reports on the immediately preceding cognitive activity can be accessed and specified without the experimenter having to provide the subject with specific information about what to retrieve. In this particular case, the subject will still retain the necessary retrieval cues in STM when a general instruction is given “to report everything you can remember about your thoughts during the last problem.” This form of retrospective verbal report should give us the closest approximation to the actual memory structure. (Ericsson & Simon, 1996, p. 19) When the delay is long, one often compensates for a lack of memory by “filling in” the memory gap, often based on what is expected. Henderson and Tallman (2006, p. 75, citing Jenkins and Tuten, 1998) argue that “memory inaccuracies occur because a fundamental goal of cognition is to make coherent sense of the world.” Longer periods of time, even when the recall support is very strong, often lead to controversy in terms of what is being accessed and what claims are being

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

Characterization of Stimulated Recall

47

made by the researcher. Of course, recalls also exist as delayed recalls, as in diary studies or exit interviews. In these cases the stimulus is often rather weak: “On Fridays, while looking at the textbook and materials you used in class, write in a language diary about your experiences in class over the last week” or “In this exit interview please tell us about your feelings about the language program from which you have just graduated while looking at some of the work you completed doing that program.” In these cases, the recency effect argument put forward by Cohen (1987) is probably operating in its weak form. Obviously, researchers should take care in making claims and spell out potential problems with validity. Cohen and others (e.g., Bloom, 1954) have also pointed out that the majority of loss of memory may occur almost immediately after the event. Thus, delays of three hours to three days may result in data that are less clearly reflective of the thoughts during the event. Henderson and Tallman (2006) have argued that gathering data beyond 48 hours calls into question the reliability of the data. Garner (1988) reported an experiment involving participants who retrospected about strategic activity immediately after task completion and two days later. There were significantly fewer cognitive events recalled in the protocols of delayed-report participants’ protocols than in the protocols given by same-day reports of participants. Bloom (1954), in his early studies found as high as 95 percent accurate recall within two days of the original event, but the accuracy rate declined to about 65 percent two weeks later. Hence, recall should take place as soon as possible after the original task. We have found in the second language literature many examples where the recall took place immediately following the event with others taking place a week, two weeks, and even six months after the event. What we have presented here is the “ideal” situation. Often logistics intervene (e.g., participant availability; class schedules) and time lapses are longer. It is important that the time frame be accurately reported so that the research community can fully understand the results and the context that surrounds data collection.

Language of the recall In second language research, we are generally dealing with two languages—the language of the event and the language of the recall and because of this, we are faced with a problem not present in research with native speakers: What is the potential influence of the language used in the recall sessions on the accuracy of the recall? If the recall is carried out in the L1, this may allay some concerns, but if the event to be recalled is carried out in the L2, the disparity in events and possibly the locus of knowledge accessed may give rise to a new set of concerns. If both the event and the recall are carried out in the L2, the problem is exacerbated by the fact that we are frequently dealing with learners who are limited in their ability to express themselves in the target language and

48

Characterization of Stimulated Recall

EXAMPLE 3.1

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

NNS original comments

NS-1 feedback

NNS response

NS-2 stimulated NNS response recall prompts

He’s taking picture to his child He’s taking a picture of his child Yeah

I think the preposition is not suitable Is that now?

Yeah

Or then? No idea Can’t notice back then

to understand the questions being asked of them. Therefore, not only are we hampered by the usual difficulties with verbal report data (i.e., are the reports truly reflective of thought processes?), but we are also confronted with the additional problems of having to interpret what a learner says and making the assumption that a learner correctly understands what is being asked. Example 3.1 is a case in point, the NNS says “I think the preposition is not suitable.” The interviewer probed and was able to determine that the NNS was truly referring to the here and now, as opposed to intending something like “I thought that the preposition wasn’t suitable.”1 The issue of recall language also became apparent in a study by Mackey et al. (2000). This study involved two groups of learners, ESL and Italian as a foreign language (IFL). The recall was conducted in English for the first group (where it was the L2) and in English as well as for the second group (where it was the L1). Of the categories included when analyzing the response types from the stimulated recall sessions, two are relevant for the present discussion: no comment and unclassifiable. What differed was the distribution of comments in these two categories. Whereas the ESL learners were unable to provide any recall comments for 12 percent of the data, the IFL learners did not produce any recall comments for only 3.5 percent of the data. We surmise that the use of language in the recall sessions contributed to this difference. The ESL learners, communicating in the L2 and faced with language difficulties, may have been more likely to state simply that they had no comment when they were unable to express their thoughts during an

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

Characterization of Stimulated Recall

49

interaction episode. Using a subset from the two datasets (50 percent of each), we calculated a comparison of the number of words per recall comment. For the ESL learners communicating in the L2 for the recall, the average number of words per recall comment was 16. For the IFL learners, who were using their L1 for the recall (English) the average number of words per recall comment was 26. It is also difficult to determine whether the difference in words per comment is related to the use of the Ll vs. L2 or to cultural differences between the American IFL participants and the ESL participants. Additional studies to explore the differential effects of using the L1 or the L2 during the recall session, as well as the potential effect of cultural background, would help clarify this issue (see Mangubhai, 1992). Another issue is that the limited proficiency of learners operating in their L2s may influence the content of the recalls. They may verbalize what they can, rather than the full version of what they were thinking. Research questions should take into account that when the L2 is used for the recall, some things may be easier for learners to verbalize than others. Pilot testing may ascertain whether some of the expected verbalizations are beyond the linguistic competence of some learners. Some issues that need to be considered when determining the language of the recall are given below. UÊ Context. In some situations, there is little choice. In L2 environments, participants have different L1s, and it is often not possible to conduct the recall session in the L1. One might have a subset of participants for whom the L1 is a possible language of recall and another subset for whom it is not. It is advisable in this case to do all in the L2 and not mix because the ability to express recall will differ across participants. UÊ Proficiency. With very high proficiency learners, the L2 might be a possibility for the recall language. The advantage is that one is matching the event with the thoughts about the event. In much research, logistics prevails and there is a language switch. It is important to clearly state in the research report what the language of the recall was. UÊ Preference. Some researchers give the participant the choice of language. This has the advantage of participants being able to verbalize more thoughts when they feel comfortable in expressing their thoughts. The disadvantage, as mentioned above, is that there may not be the same level of comparability across participants in a study.

Questions to Be Asked It is not surprising that answers to questions may, in part, depend on how the question is posed. As Ericsson and Simon (1996) note, even in laboratory settings, questions such as “How long was the film” versus “How short was

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

50

Characterization of Stimulated Recall

the film” yield response differences (p. 38). Verbal reporting is no different in this regard. One must keep in mind that the effects of researcher questions can potentially influence and compromise the procedure. The questions themselves cannot suggest any answer and should not be ones where the participant believes that the interviewer expects or wants to hear a particular response. Given that the goal of stimulated recall is to tap learners’ thought processes while they were performing a particular task, the method itself will have no validity unless one can be reasonably sure that accurate recall in fact is taking place. As mentioned in previous chapters, the premise behind stimulated recall is that an event that has taken place is being recalled through the prompt and that the prompt itself helps to ensure that accessible and accurate memory structures are brought into focus and recalled. That is, when a participant is reminded of an event by means of video, audio, or written document accompanied by a researcher’s prompt, the event itself and the thoughts that occurred during the event are vividly recalled. Returning to Figure 3.1, the bottom two boxes, ‘Recall’ and ‘Hindsight Report’ represent two types of responses. Only the first, ‘Recall’ represents reliable data for the purposes of stimulated recall. The second box ‘Hindsight Report’ comes from questions that have allowed the participant to introduce intervening thoughts, such as those that might involve reflection about the event rather than thoughts during the events.

Appropriate questions t t t

What were you thinking when she said x? What were you thinking when you shook your head? I notice that you shifted your position in your chair and you hesitated, what were you thinking about then?

Questions that may elicit thoughts at the time of the interview, not at the time of the original event t t t

I notice that you shifted your position in your chair, why did you do that? Your partner said “x,” why do you think he said that? Why did you hesitate before responding?

In sum, make sure that the recall questions focus on the timeframe in question. In the section on pitfalls below, we return to these differences and show examples of how the researcher can be drawn into the time of the interview and not the event the recall session is focusing on.

Characterization of Stimulated Recall

51

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

Initiation of Questions/Recall Interaction When considering both the recall stimulus and the procedure, it is important to think about who initiates the stimulus episodes. Who selects them? Who interacts with them? In Mackey et al. (2000), we designed the stimulus (videotapes of learners interacting) to be accessed by both the learner and the researcher. The remote control was set on the table between the two individuals. The learners were initially asked to choose a segment and to pause and replay the recording to ensure that they knew how to operate the control and to help them feel comfortable doing it. We chose certain segments for replay because they contained implicit negative feedback, which was the focus of the study. The learner-initiated replays represented only about 10 percent of all the replays in the data. It is often useful to provide models of other-initiated and self-initiated recall support interaction. In considering who initiates various aspects of the recall support, it is also useful to recognize that this often interacts with who controls the verbalization of the recalls. Some recall procedures do not rely on researcher–participant dyads, but participant– participant–researcher, and even whole-class–teacher–researcher groups. In cases of more than one participant, the interaction between or among participants must also be considered in terms of its potential effect on the recall data. Often researchers give participants a choice, as was the case in the example given above. If one gives participants the option of stopping the recording, one must be sure that she or he knows how to operate the equipment and, more important, is clear on the need to talk about what they were thinking as they were carrying out the task as opposed to expressing a thought they have at the present time and want to talk about during the recall session. In this section, we have attempted to classify and exemplify issues related to the recall support or stimuli as well as the wider context for the stimulated recall procedure. Based on our classification system, together with theoretical predictions and currently existing research using stimulated recall, we can make some specific recommendations about the procedure.

SIGNIFICANT RECOMMENDATIONS Structure t

t

How much structure is involved in the recall procedure is strongly related to the research question. Generally, if participants are not led or not focused, their recalls will be less susceptible to researcher interference. Unstructured situations do not always result in useful data. However, if learners participate in the selection and control of stimulus episodes and are allowed to initiate recalls themselves, there will again be less likelihood of researcher interference in the data.

52

Characterization of Stimulated Recall

Stimulus

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

t

The stimulus should be as strong as possible. This may mean using a stimulus from more than one source. For example, participants can watch a video if the recall immediately follows the event. If it is more delayed, they can watch a video and even read a transcript of the relevant episodes as well.

Timing t t

Stimulated recall data should be collected as soon as possible after the event. As the event becomes more distant in memory there is a greater chance that participants may say what they think the researcher wants them to say or may create a plausible explanation for themselves because the event is less sharply focused in their memories.

Other Considerations t t

Carefully consider what the language of the recall should be. Consistency across participants is important. Individual carrying out the protocol must have a good understanding of how to ask questions.

Preparing Researchers and Participants Before using stimulated recall methodology to collect data, it is advisable to develop a detailed research protocol2. The stimulated recall procedure is generally complex. For example, a stimulated recall of oral interaction often involves making at least two separate data recordings, one replay, and two sets of instructions. Thus, the amount of detail specified in the research protocol is important. A detailed protocol helps the researcher to anticipate problems in advance while also acting as a checklist for the many variables and factors the researcher needs to consider and balance while carrying out the procedure. It is also important to think carefully about and pilot test all the procedures laid out in the instructions to the learner, paying particular attention to the effects of the instructions on the procedure. Pilot testing can often lead to revisions and fine-tuning of the protocol and can help to avoid costly and time-consuming problems during the data collection procedure. Careful pilot testing can also help to avoid the loss of valuable, potentially useful, and often irreplaceable data. Boxes 3.1 and 3.2 contain complete example protocols for stimulated recall procedures in oral interaction settings. Box 3.1 shows a set of instructions for the participant and the researcher which are detailed, clear, and unambiguous.

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

Characterization of Stimulated Recall

53

Instructions for stimulated recall procedures are often standardized. For example, they may be tape-recorded. At times they are read from a script. One reason for the standardization of instructions is the importance of orienting the participant to the actual time period under recall. A single word or tense change can affect the nature of the participant’s recall. The section on procedural pitfalls deals with this issue in greater detail. It is important to note that if participants are asked not just to vocalize their thoughts but also to explain them, the additional cognitive load may interfere with memory and recall. Hence, Ericsson and Simon’s (1993) often-quoted direction is not to ask participants to explain but to “keep talking.” The instructions provided in Mackey et al. (2000) were as specific as possible. They were developed after two separate pilot studies had revealed problems that the instructions had not previously covered. For example, in the first pilot, one trainee researcher engaged the participants in conversation by making full responses to the learners’ statements. Hence, the instructions were expanded to include the comments about back-channeling being preferable to responses. It is important to note that the instructions not only tell the researcher what to say to prompt the recall comments but also provide information about what to say during the recall and after it.

BOX 3.1 Sample research protocol from stimulated recall experiment on oral interaction

COLLECTION SCENARIO Two researchers can collect the data, with one doing the interaction and one carrying out the stimulated recall. Alternatively, one researcher can do both the interaction and the recall.

Interaction Researcher Responsibilities: t t t t

t

Provides instructions to participant (learner) for task activity. Interacts with participant (learner) during task activity. Ensures audio recorder is working during task activity. During the task activity, a second researcher can sit unobtrusively in the corner and take notes about which topics might be raised with the learner after the recall is concluded. Devise and ask final questions after stimulated recall is concluded.

Interaction Instructions for Researcher: t t t

Engage in some chitchat for about 1–2 minutes, ask participants (learners) to read and sign the consent form so researchers can use the data. Explain that researchers are interested in how language is learned. Give directions for the task.

54

Characterization of Stimulated Recall

t

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

t

Interact during task, creating breakdowns if necessary so that some negotiation takes place. Target phonological, lexical, and grammatical errors. Try to limit the session to about 10–12 minutes in order to get the recall session done in 30 minutes.

Stimulated Recall Researcher Responsibilities: t t t t

Operates video recorder and makes notes during task interaction between participant (learner) and first researcher. Replays video during the stimulated recall session. Carries out stimulated recall (selecting segments of video to examine). Ensures audio recorder is working during stimulated recall.

Stimulated Recall Instructions: Provide an explanation of the next step, for example: What we’re going to do now is watch the video. We are interested in what you were thinking at the time you were talking about the pictures. We can see what you were doing by looking at the video, but we don’t know what you were thinking. So what I’d like you to do is tell me what you were thinking, what was in your mind at that time while you were talking to her. I’m going to put the remote control on the table here and you can pause the video any time that you want. So if you want to tell me something about what you were thinking, you can push pause. If I have a question about what you were thinking, then I will push pause and ask you to talk about that part of the video. Demonstrate stopping the video and asking a question for them. If the participant stops the video, listen to what he or she says. If you stop the video, ask something general, for example: What were you thinking here/at this point/right then? Can you tell me what you were thinking at that point? I see you’re laughing/looking confused/saying something there, what were you thinking then? If their response is that they don’t remember, do not pursue this because “fishing” for answers that were not immediately provided increases the likelihood that the answer will be based on what the person thinks now or some other memory or perception.

Characterization of Stimulated Recall

55

Try not to focus or direct participant responses beyond “what were you thinking then.” You might want to focus attention on the NS utterance by saying something like.

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

Do you remember thinking anything when she repeated that? Can you remember what you were thinking when she said that/those word(s)? Can you tell me what you thought when she said that? Try not to react to responses other than providing back-chaneling cues or nonresponses: Oh, mhm, great, good, I see, uh-huh, ok When participants have finished the recall, if there are no other experimental steps to conclude, an additional explicit question may elicit useful data to address the research question. You can ask if they have any questions or comments about the video or the task they have done. At that point, the second researcher who has been quietly sitting in the corner can say something like. I was wondering if I could ask you something? I’m just curious. I noticed when you were talking about the video that you mentioned your pronunciation and vocabulary quite a lot. Is that what you are most concerned about when you are speaking? What about grammar? Do you think about grammar at all when you are speaking? There were a few times when I corrected your grammar while you were speaking. Did you notice that I was correcting your grammar? Note: This is a detailed protocol that was used for a study of learners’ perceptions about oral interaction (Mackey et al., 2000). It is reproduced here as an example.

A second detailed protocol appears in Box 3.2. This protocol was developed for a study that used stimulated recall as a pilot-testing tool. The goal of this stimulated recall procedure was to ensure, in advance of the experiment, that participants in the study perceived the treatment in the way the researcher intended. This preemptive use of stimulated recall is a good example of the versatility of this methodology. This research was carried out by Leeman (1999) to explore the question of how participation in interaction can facilitate SLA by focusing on exactly how recasts (targetlike reformulations of a learner’s original utterance) promote L2 development. They can provide implicit negative evidence, which has been widely discussed in both the L1 and L2 literature.

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

56

Characterization of Stimulated Recall

Recasts have been a central issue in debate on the role of negative evidence in language acquisition. Leeman points out that recasts are complex discourse structures that, in addition to negative feedback, also provide positive evidence (Pinker, 1989) and notes that the juxtaposition of the nontarget-like original and the target-like reformulation can also enhance the salience of target forms in the input. This may account for greater gains demonstrated by learners exposed experimentally to recasts than to other forms of positive evidence.

BOX 3.2

Second protocol for stimulated recall procedure

TREATMENT DESCRIPTION: INTERACTION This group received no negative feedback on nontarget utterances. Thus, a sample response to the nontarget utterance given below is: NNS:

La manzana

rojo

está en la mesa.

the apple-[fem.]

red-[masc.]

is on the table

“The red apple is on the table.” R:

Um hmm. Um hmm. ¿Qué más? What else?

However, when the researcher provided directions for object placement (in the second part of each treatment task), stress and intonation were used to enhance the salience of the target structure. Specifically, the gender and number marking of the adjective were stressed: R:

La manzana

roJA

está en la mesa.

the apple [fem.]

red [fem.]

is on the table

“The red apple is on the table.”

STIMULATED RECALL PROTOCOL After completion of treatment, stop recorder and prepare for playback. Check if participant needs to get a drink or use the restroom. Say: Now we’re going to do a different kind of activity. In this part of our meeting, I’m interested in finding out what you were thinking while we did the last activities. I know what you said, because that’s on the recording, but I’d also like to know what was going through your mind while we did the previous activities.

Characterization of Stimulated Recall

So now we’re going to listen to the tape, and I want you try to remember what you were thinking then, not what you think about it now.

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

Put audio recorder close to participant. Show how to pause the recorder. Say: I want you to stop the recorder whenever you remember what you were thinking. I’ll also stop the recorder from time to time and ask you to think back and tell me what was going through your mind. Ask if there are questions about procedure. Ask, Is it clear what we’re doing? I want to know what you were thinking as we did the activities. Have participant try pausing the recorder. Possible answers to possible questions: t t

they can stop it as often as they want we don’t have to take turns

Begin playback. Pause after participant’s first utterance and ask, “What were you thinking right then?” After answer, wait a few seconds and ask, “Do you remember anything else about what you were thinking at that moment?” Make sure to pause after utterances containing both target and nontarget agreement. For the second half of activities (researcher provides directions to participant), pause after researcher’s first utterance and prompt as above. In enhanced salience group, make sure to pause after some utterances with enhanced salience and some without. If participant starts to talk without pausing the recording, pause it for him/her, then wait for him/her to unpause it. If he/she doesn’t, wait for a few seconds of silence, then ask, “Do you remember anything else about what you were thinking at that moment?” If participant says he or she made a mistake on agreement, try to maintain orientation to time of production, for example, “were you thinking that at the time?” If participants say they noticed that the researcher corrected them (recast and negative evidence groups), ask what they thought at the time—did they think it was a correction then? Keep them focused on their interpretation at the time it was provided. Ask similar follow-up questions even if participant pauses the recording and discusses things not related to research questions (e.g., ser vs. estar, vocabulary, locatives, pronunciation). Emphasize thoughts during activity (not interpretation now or later in the activity).

57

58

Characterization of Stimulated Recall

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

After stimulated recall is complete, move to debriefing questions/ semistructured interview (so doesn’t influence stimulated recall). Ask all questions, even if the participant already provided answers during stimulated recall: 1. 2. 3. 4.

What did you think about these activities? Did you learn anything? (If so, what?) What did you think about my interaction with you? Did you notice anything specific about my interaction with you?

If participants mention any interactional features, ask what they thought at the time. “At that time, why did you think I was doing that?” Specifically query whether, at the time of the enhanced salience, they thought the researcher was telling them they had said it wrong previously or whether the enhanced salience was to show them how to say it in Spanish. Note. This procedure was used as a pilot test to explore whether enhanced salience in oral interaction was perceived as negative feedback or not (Leeman, 1999). First, the treatment that is being explored through the stimulated recall procedure is described. Then, the detailed protocol appears.

Leeman’s (1999) study addressed the question of how recasts may lead to L2 development by isolating negative evidence and enhanced salience and comparing their effects to those of recasts. She designed an experiment to compare empirically four types of communicative interaction and their effects on learning noun–adjective agreement in Spanish as a foreign language. Seventy second-semester college learners (viewed as beginners) were randomly assigned to four treatment conditions. All participants engaged in communicative interaction individually with a NS, and the provision of negative evidence and enhanced salience was experimentally manipulated in the four groups as follows: 1) intensive recasts (i.e., negative evidence and enhanced salience), 2) negative evidence without enhanced salience, 3) enhanced salience of target structures without negative evidence, and 4) control (i.e., neither negative evidence nor enhanced salience). The effects of treatment were evaluated by means of oral pretests, immediate posttests, and delayed posttests consisting of picture-description tasks designed to elicit the target structure. Learners in a pilot study carried out before Leeman’s main data collection effort participated in semi-structured debriefing interviews while listening to themselves speaking on tapes of the treatment. This was to address the concern that learners in the enhanced-salience group might also perceive the treatment they received as a form of correction, thereby confounding two of the variables

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

Characterization of Stimulated Recall

59

Leeman’s study sought to isolate. When salience is enhanced through word stress as in a recast where a learner says “the dog is over the street” and the response is “the dog is ON the street,” learners may perceive the stress as corrective (i.e., negative) feedback, not (or possibly also) as simple positive evidence about the form. Results from Leeman’s stimulated recall interviews demonstrated that learners in her enhanced-salience group interpreted the researcher’s use of stress and intonation simply as a way to highlight the need for noun–adjective agreement rather than as a means of signaling that their original production was non-target-like. Based on the stimulated recalls in her pilot study, Leeman was able to claim that her enhanced-salience group did not receive negative evidence and that her study isolated variables. In viewing Leeman’s protocol, it is important to note again that the instructions not only tell the researcher what to say to prompt the recall comments but also deal with what to say during the recall and after it, and that they also include provisions for dealing with unexpected eventualities. In many experiments carried out by a research team or pair there is often a principal investigator or a more experienced researcher working with more junior colleagues or graduate students. Having a researcher who is experienced at carrying out stimulated recalls is an excellent way to approach the procedure. The importance of researcher training and instructions when using stimulated recall methodology cannot be overstressed. Stimulated recall procedures, in our experience, although somewhat challenging initially, become easier with practice, and the procedure is best observed before the researcher attempts to carry it out, particularly because the stimulated recall procedure can go awry very easily. Participants, even when presented with carefully written and pilottested instructions, can often be confused by the procedure. The researcher needs to be able to put participants at ease, convey the impression that the participants are not being asked to do something very difficult or unnatural, and help participants provide recall comments without challenging their preconceived notions of appropriateness and without leading them. It can be a tall order. In some sense, carrying out a stimulated recall procedure is not unlike the sociolinguistic interview technique perfected by Labov (1972) and popularized during the 1970s. As researchers, we need to be aware of the great many variables and pitfalls when collecting stimulated recall data, and we may also need to practice collecting such data more than once. If researchers can work as part of a team, the procedure will be easier to handle. If circumstances dictate that a researcher utilize the procedure for the first time without prior training, then we cannot overemphasize the importance of careful preparation. Being asked to introspect is particularly difficult for L2 learners from some cultural backgrounds. For example, in many cultures the teacher or NS is often considered the expert, and challenges to authority are not encouraged.

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

60

Characterization of Stimulated Recall

Thus, when asked to introspect about their actions after a teacher’s procedural direction in a classroom study, some speakers may be uncomfortable conveying to researchers that they did not follow a teacher’s instruction, such as to read the passage silently, because they had already read it and decided it would be more helpful to prepare for the next class. Admitting to not following a direction, even with this sort of motivation, may be considered a direct challenge to the teacher’s authority and could be difficult for some participants. This difficulty illustrates just one of the many challenges faced by researchers carrying out stimulated recalls. An experienced researcher who knows the participant pool and characteristics well might be better able to recognize which problems are likely to be due to stimulated recall instructions and which are related to participant characteristics. Clearly, the research questions dictate the nature and content of the instructions. In concluding this section, we provide general recommendations about instructions.

Participant Training When we consider the degree of support offered for the stimulus in a delayed recall, we must also explore how well the participants are trained at interacting with the stimulus. First, instructions and training need to be distinguished. Of course, in many if not most experiments, participants are generally provided with some form of instructions, however brief. While both are clearly important, our focus here is on training as opposed to instructions. Adequate direction is often needed to keep participants on track and in the “there and then” as opposed to the “here and now” (see Chapter 4 for more information on temporal location). Ericsson and Simon (1987, 1993, 1996) claim that participant training does not affect the validity of the verbal report and in effect only serves to increase completeness. We argue that in some designs, training participants by showing them videotapes of others carrying out the procedure or giving them transcripts or diagrams to view may affect the quality of the report data in many ways. Empirical research is still needed to address this issue. For example, priming studies in the psychology literature have shown that participants’ verbalizations can be affected by a number of factors in the preceding input. In the L2 context, we need to be particularly vigilant about introducing potentially confounding input variables. The training effect and the effect of memory interference on the recall data are both important issues that should not be underestimated.

Characterization of Stimulated Recall

61

RECOMMENDATIONS: INSTRUCTIONS

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

t t

t

t

t

t

t

Instructions for both researchers and participants should be carefully drawn up and pilot tested. Instructions for participants should be recorded, read aloud by the researcher, or presented to the participants in written format where appropriate since standardization is important. Participants should be sufficiently trained so that they are able to carry out the procedure, but should not be cued into experimental goals or unnecessary information. Appropriate training is best understood through careful pilot testing. Often, simple instructions and a direct model will be enough in a stimulated recall procedure. Sometimes, even instructions are not necessary; the collection instrumentation will be sufficient, for example in the case of a questionnaire or a Q–A interview. Instructions for researchers should take into account as many eventualities as can be anticipated. For example, this may include what to say if the participant stops retrospecting and begins to analyze past actions/thoughts in the present. It may also include the scenario where the participant begins to talk, providing a recall comment while the tape is still playing.3 The participant may have forgotten to pause the tape. The researcher needs to stop the tape while facilitating the flow of the participant’s speech and to let the participant know she can restart the tape, all without taking the floor from the participant. If the tape is not stopped, the clarity of the recall speech captured may not be good enough for transcription. Instructions for researchers should include information about potential effects of participant characteristics on the recalls and how to minimize effects where possible. Procedures, such as the selection of video or written segments as topics for recall comments, should be modeled by researchers for participants. When possible, researchers should observe the stimulated recall procedure being carried out before participating in one.

A complete transcript of a stimulated recall procedure involving oral interaction appears in Box 3.3.

62

Characterization of Stimulated Recall

Examples of stimulated recall data (collected for a study carried out by Mackey et al., 2000)

BOX 3.3

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

Recall episode 1 NNS:

un bicchiere

però

per il café

A glass

however

for the coffee

“A glass for coffee” INT: RECALL:

un bicchiere,

un bicchiere,

un bicchiere?

A glass

a glass

a glass

That I was really off on my answer or on my description like she had no clue what I was talking about.

Recall episode 2 NNS: INT:

Poi

c’è

la finestra

Then

there is

the window

co- no, non ho capito, come? Wh- no, not I have understood, what “Wh, I-I didn’t understand, what?”

NNS:

la finestra The window

RECALL:

Same thing. I was way off.

Recall episode 3 NNS: INT:

c’è un

tipo di fior

There is a

type of flow-

un tipo di che cosa? “A type of what!”

NNS:

fiore flower

RECALL:

Same thing and I was trying to … I don’t know my mind was going blank. I was trying to think of the vocabulary.

Recall episode 4 NNS:

Poi un bicchiere Then a glass

INT:

Un che? Come? A what what

Characterization of Stimulated Recall

NNS:

Bicchiere Glass

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

RECALL:

I knew it wasn’t a glass and then I don’t know. I just am getting old or something. I was drawing a blank. Then I thought of a vase, but then I thought that since there was no flowers. Maybe it was just a big glass. So, then I thought I’ll see it and see. Then when she said “come” I knew that it was completely wrong.

Recall episode 5 NNS: INT: RECALL:

Ci

sono due bottigli

There are

two bottles (m. pl.)

Ci

sono bottigli? …

There are

bottles (m. pl.) two? (bottles is feminine)

due?

I was just hoping she didn’t hear me the first time.

Recall episode 6 NNS:

Poi

sopra i

bottiglie

ci sono due

Then

above the (m. pl.)

bottles (f. pl.)

there are two

“Then, above the bottles there are two …” INT:

sopra che cosa? Above what?

NNS:

di … of

RECALL:

Well, at first I thought how to explain that it wasn’t the next one, but the next one after that and then I couldn’t think that’s why I abandoned that idea all together and then I was just trying to focus on how to say that and then if need be I would come to that there was a missing shelf.

Recall episode 7 NNS: INT:

Poi,

a destra di tavolo

Then, at right

of table (m. ending, should be feminine)

A destra di? At right of?

NNS:

Tavolo Table

RECALL:

That she didn’t hear me.

63

64

Characterization of Stimulated Recall

Getting Ready

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

Preparations Setting up any experiment takes considerable thought; stimulated recall is no exception. In fact, it is even more complicated than other types of experiments in that there are two stages, the event and the recall.4 One has to think carefully about all of the equipment needed as well as the space that will be required.

Considerations on the day of data gathering UÊ What equipment is needed? UÊ Video UÊ Be sure that all of the eventualities are thought about with regard to recording and playback. UÊ Audio UÊ Same as above. UÊ Table UÊ Chairs UÊ Placement of equipment UÊ Where will the video camera be placed, if using video. UÊ How close do you want chairs? UÊ What space is needed? UÊ Where will stimulated recall session take place? UÊ In the same room? UÊ Do rooms have to be reserved? UÊ Will the rooms be quiet enough for data collection? In other words, everything must be thought through carefully to avoid data loss.

Procedural Pitfalls In this section we turn to the many potential problems involved in collecting and analyzing stimulated recall data. First, we deal with the issue of timing and the problems that incorrectly estimating timing can cause. Second, we deal with the specific questions that can be addressed by stimulated recall.

Timing One potential pitfall in conducting stimulated recalls relates to how much time the researcher allocates to the procedure. There are many reasons to be careful about under- or overestimating time. For example, for most designs it is desirable to

Characterization of Stimulated Recall

65

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

standardize times as much as possible amongst participants. An accurate estimate of time will also allow smooth scheduling and can assist the researcher in avoiding fatiguing the participants unnecessarily. When conducting stimulated recall procedures, it is important to accurately estimate from the outset the time for at least three parts of the procedure. These are described in the following sections.

The approximate length of the recall support In this section we refer to the amount of time allocated during the stimulated recall procedure for the support system, or stimulus, that the researcher uses to aid participants in accurately recalling the event. Recall support systems can include audiotapes or videotapes of the participant, recorded as he or she was carrying out the event. For example, in a stimulated recall of oral interaction, participants may listen to audiotapes of themselves and their partner(s) while they were engaged in the oral interaction. In a stimulated recall of writing processes, the participant may be shown an essay or paper that he or she had rewritten, possibly alongside an earlier draft. The time needed to present the recall support needs to be carefully estimated in calculating the time for the stimulated recall procedure. For example, if a videotape of the event to be recalled is to be shown to participants, the time that the replay will take needs to be calculated and added to the time allocated for the recall procedure. In some designs, the videotape may be replayed in its entirety; in others, only segments will be played. In some cases the researcher will control and select the segments and pause the videotape; in other cases the participants themselves may be asked to select and replay segments. Sometimes the participants may replay segments more than once. In some designs participants may ask questions of the researcher about the segments, or may replay the segments as often as they wish. Not all these factors—for example, answers to participant questions—are completely under the researcher’s control, but they need to at least be recognized as time-consuming possibilities. It is often the case with stimulated recall procedures that the researcher underestimates the amount of time the recall support takes. As discussed earlier in this book, the sooner after the event the stimulated recall takes place, the more likely it is that uncorrupted memory structures will be accessed. However, participant fatigue is clearly an issue. Using the video recording of oral interaction, the example below is a time estimate for a particular procedure, followed by the actual time that the procedure took. Time: Estimated Research goal: Conduct a stimulated recall procedure with two learners who carried out oral interaction tasks in a learner–learner dyad. Initial time estimates:

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

66

Characterization of Stimulated Recall

1 2 3

Task-based interaction: 30 minutes. Stimulated recall procedure with learner 1: 30 minutes. Stimulated recall procedure with learner 2: 30 minutes. Total researcher time: 90 minutes Time for each learner: 60 minutes

Thus, the whole procedure was estimated to take one hour, with half an hour of down time or break for one learner.5 We compare this original estimate with the actual time taken. Time: Actual 1

2 3

4

Set-up time: Researcher set up video recorder for the interaction together with backup audio recorder, prepared tasks, had consent forms read and signed by participants, explained initial instructions: 17 minutes. Learners present for 10 of these minutes. Oral interaction: 28 minutes. Set-up time: Researcher rewound videotape for replay. Segments were to be selected for playback; researcher reviewed videotape for timestamps and set video to first segment to be replayed (the researcher did not need to watch whole tape as she was present behind video camera, making notes on sections to replay). Researcher set up second video camera to record the stimulated recall: 19 minutes. Recall procedure. 2 minutes for instructions on the recall, 1 minute for Q&A on recall, 11 minutes for recall replays, researcher initiated; 6 minutes for recall replays, learner initiated; 2 minutes for learner questions about specific aspects of the recall support video; 18 minutes for the participants’ comments in the stimulated recall; 1 minute for checking of microphones part way: 41 minutes. Total researcher time: 146 minutes Time for each learner: 98 minutes

The whole procedure took 2.4 hours for the researcher and 1.6 hours for each learner, with 48 minutes of downtime for one learner. The actual time taken turned out to be a little more than one and a half times as long as the initial time estimate. Learners’ schedules were affected. One learner missed lunch, and both missed a part of their ESL class. Both learners were much more taxed than the researchers had anticipated. In this scenario, pilot testing turned out to be essential in accurately estimating time.

Allocating time for the recall procedure It is also important to accurately estimate the time that the recall procedure—that is, the researcher–participant interaction minus replayed support cues—will take.

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

Characterization of Stimulated Recall

67

As seen from the above example, the participants’ comments took only 18 minutes, but questions about the procedure, questions about the support structure, and opportunities for the participants to access the support were all factors in the time taken for the procedure. This time can also vary from participant to participant. Some participants may be much more forthcoming than others. All the caveats that apply to oral interviews (see Hillway, 1969, pp. 33–34) also apply to stimulated recall procedures. Stimulated recalls may also be carried out through the use of questionnaires, with the recalls being written. Constraints such as writing speed and space allocated on the form will have an effect in such circumstances. Because writing takes longer than speaking, the interval between the event and the recall is increased and the validity of the procedure becomes more questionable. In terms of being prepared for the unexpected with reference to the effect individual participant variables have on time estimates, it may be helpful to consider that while conducting stimulated recall procedures, each of the authors of this book has had unfamiliar and unexpected experiences with participants. One of the authors experienced a situation in which a participant opted part way through not to continue with the experiment and left the room abruptly, after stating that he was uncomfortable with the procedure. Fortunately, this participant had clearly understood and internalized the information on his consent form, knew that he had the right to leave, and exercised this right. The other author experienced the case where a procedure, which had ranged between 18 and 30 minutes for each of 10 participants, lasted only 7 minutes. The participant was completely uncooperative and would only say, “I don’t know” or shrug in response to all the recall cues. He also gave a negative response to two direct questions about whether he wanted to stop. This participant was asked three times if he understood the instructions, and the procedure was also modeled for him. He responded positively to those queries about his comprehension of what was expected or required. He simply seemed to dislike questions about his thought processes or previous verbalizations and was not inhibited about demonstrating this dislike. In our experience, stimulated recall interviews that we have carried out have revealed some of the most interesting data we have seen. However, the episodes described in examples 3.6–3.12 (that resulted in unusable data) demonstrate the vulnerability of the methodology.

Considerations for accurate time estimates t t t t

Individual characteristics Language of recall Degree of structure provided Who is controlling the recall

68

Characterization of Stimulated Recall

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

Allocating time for setup and equipment Stimulated recall procedures are often among the most time consuming in L2 research in terms of equipment setup and preparation. The event to be recalled or the product investigated is usually recorded or taped in some way. This recording or product-preservation has been made in a way that can be shown to the learner as a stimulus. It is important to recognize that the stimulated recall of the participant also needs to be recorded in some way. The degree of structure necessary for the support needs to be anticipated when planning the procedure. Reviews of tapes, drafts, or other stimuli and the giving of instructions and perhaps modeling of the procedure all need to be factored into the time calculations.

Verbalization Another issue relating to timing in stimulated recall data is the issue of how long a researcher allocates for verbalization on a particular topic during the recall procedure. This issue often comes up in relation to recalls about problemsolving activities. For example, a participant may immediately hit on and describe the mental process that happened to be the focus of the study. Researchers need to take care when this happens that they do not cut off the learner’s next verbalization and move immediately to the next episode or problem for which they elicit stimulated recall data. Such immediate moving on could be perceived by the learner as implicit feedback about what the researchers were looking for, especially if the learner had not been stopped in previous episodes. The learner might realize that the previous answer was exactly what the researcher was looking for and modify future answers. Alternatively, a learner might perceive being cut off as a cue that the last response was uninteresting. Clearly, equal time and consistent action by the researcher are both crucial. Wherever possible, participants in a study should receive similar amounts of time for all topics or episodes being recalled. Consistency across time may not always be practical. As Chi (1997) pointed out, if standard times for recall are not possible, then it may be necessary to analyze the recalls consistently: If one person were cut off after one or two turns in relation to one episode, then all of that learner’s episodes could be analyzed up to a similar cutoff point. Also of interest in relation to time is the issue of how much verbalization each learner provides in a standard time frame. Obviously, some individuals are more talkative than others. This is a difficult issue to resolve. On one hand, many researchers have pointed out that verbal protocols should be as full and complete as possible. On the other hand, controlling verbosity is not always possible. Generally, the quality or content, rather than the quantity, of the recall is important for the analysis, but if some recalls are five times the length of others, quality may be affected. It is not usually possible to manipulate individual characteristics such as verbosity. Baseline data for each participant, showing how verbose he or she is

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

Characterization of Stimulated Recall

69

in relation to other topics or tasks, is desirable. Then, more wordy recalls for some people may be seen as a function of style. In practice, however, we have found no examples of studies in the literature that have collected such baseline data. Another potential pitfall relates to the time frame between the event itself and the stimulated recall. Recall from discussions earlier in this chapter, memory decay can seriously affect the accuracy of recall. Hence, recall should take place as soon as possible after the original task. There are other difficulties, however, that are less obvious. We present excerpts from a recall session taken from the study by Mackey et al. (2000). The data reported in Examples 3.6 to 3.12 were discarded for reasons that we make clear in our discussion below. Data loss is something to be avoided in any study, and stimulated recall is no exception. The participant in excerpts 3.2–3.12 was a Korean student studying English in the United States. She had just completed a spot-the-difference-task with a NS. The NS had been instructed to provide indirect negative feedback (including recasts) to the learner when errors were made. The purpose of the study was to determine the extent to which this feedback was perceived as feedback by learners. Types of feedback provided by the NS in Mackey et al. are provided in Examples 3.2 to 3.5. Types of feedback provided and the interviewer’s recall prompts are provided in Examples 3.6 to 3.12. All sessions in the Mackey et al. study were videotaped and immediately followed by the stimulated recall using the videotape as a prompt. In Examples 3.6 to 3.12, we present the interviewer’s prompts as she tried to encourage the EXAMPLE 3.2

NNS:

there is a there is a cat. He is a black.

NS:

he is a what?

EXAMPLE 3.3

NNS:

the rear, rear [vleks]

NS:

the rear what?

EXAMPLE 3.4

NNS:

he also standing

NS:

he’s standing?

EXAMPLE 3.5

NNS:

people enjoying beach

NS:

people are enjoying the beach

70

Characterization of Stimulated Recall

EXAMPLE 3.6

NNS original comments

NS-1 feedback

NNS response

He’s what

Look at me

NS-2 stimulated NNS response recall prompts

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

He just look at-look at me

He’s looking at Yeah you? Why was she saying that?

It’s a urn, I cannot cannot cannot answer clearly

What do you I have to mean clearly? answer, I have to answer correct because the word is so weird You are saying the word is weird?

Korean learner to recall what she was thinking at the time of the event. Two NSs participated in the research. Native speaker 1 performed the original task; NS 2, who had been operating the video during the original task and who had been paying careful attention to the feedback interaction, carried out the recall session. The crucial episodes are given in bold typeface. The emphasis in stimulated recall must be on the thought processes during the event itself, but the interviewer moved away from that and was focusing on the here and now of the stimulated recall session in Example 3.6, saying, “What do you mean clearly?” and later “You are saying the word is weird?” without making it clear that the NNS was supposed to be recalling what she was thinking at the time of the event itself. This entire dataset had to be eliminated from the final analysis of this study given that it was not clear what the learner was actually describing in the recall session. In Example 3.7, the interviewer should have been attempting to determine if the NNS was aware of why the NS was asking about the color yellow. Instead, she asked the question using the present tense, saying, “you’re speaking too quietly?” rather than saying something like, “when she said, ‘the what is yellow,’ what were you thinking?” In other words, the NNS’s comments about her low voice are ambiguous. Was she thinking that at the time of the feedback during

Characterization of Stimulated Recall

71

EXAMPLE 3.7

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

NNS original comments

NS-1 feedback

NNS response

NS-2 stimulated NNS response recall prompts

And the [kal‫ܫ‬r] color is yel-yellow The what is yellow Yellow the color? The color is yellow?

What? What?

My voice is maybe low

You’re speaking too quietly?

Yeah

the original session? Or was she only thinking about it now that her attention was focused on the original exchange? The interviewer did not bring the focus clearly to the time of the recall. EXAMPLE 3.8

NNS original comments

NS-1 feedback

NNS response

He’s standing

Yeah

NS-2 stimulated NNS response recall prompts

He also standing

Why did she repeat it after you? Because I still speak lower … and maybe she don’t understand She doesn’t hear?

Yeah

She doesn’t hear or doesn’t understand Why does she say that?

Doesn’t hear

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

72

Characterization of Stimulated Recall

In Example 3.8 the interviewer continues to focus on the time of the recall rather than on the time of the initial event. The interviewer again, through her use of the present tense and her failure to refer to the past event, continued to prompt the learner to focus on what she was thinking at the time of the recall session rather than what she was thinking at the time of the original picture difference task. In Example 3.9, we see further evidence of the interviewer’s focus on the current thinking of the NNS as opposed to the on-line thinking at the time of the original exchange. This example again shows the problematic practice of focusing on the time of the recall and adds another one: The interviewer led the NNS participant toward an answer, suggesting that she had left something out. In the next turn, the interviewer suggested to the NNS why the NS participant in the interaction might have repeated something. The interviewer continued in the same vein in Example 3.10; however, she changed her tactic in Examples 3.11 and 3.12 and finally began to ask questions that are reflective of appropriate probing of thought processes during the EXAMPLE 3.9

NNS original comments

NS-1 feedback

NNS response

NS-2 stimulated NNS response recall prompts

Right right side Facing to the right My answer is not suitable Not suitable how? She asked which side fish is is face so I had to answer it is facing to the right side So when you said right side, you left something out?

Yeah

So, that’s why Yeah she repeated it?

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

Characterization of Stimulated Recall

73

original task. In Example 3.10 it is unclear when the NNS realized that “people” requires a plural verb. It is likely that she thought about it at the time of the stimulated recall rather than at the time of the spot-the-difference task. The interviewer prompts were not eliciting at the desired there-and-then processing. In Example 3.11, also presented earlier as Example 3.1, the interviewer apparently realized her mistake and began to focus on previous thoughts rather than on thoughts during the recall; the NNS’s final response is interesting because it seems that she was not aware of the corrections when they happened. Rather, she only became aware of them during the playback sessions.6 In Example 3.12, the interviewer asked about what she understood during the original task, but it was clearly too late to elicit useful recall comments given all the previous inappropriate questions. Further, there was still a leading question at the end, “Did you understand that then?” rather than a more neutral question, such as, “What were you thinking when she said ‘all three people are in the sea’?” All of these data had to be thrown out because they could not be reliably considered to reflect the thought processes we had aimed to get at. EXAMPLE 3.10

NNS original comments

NS-1 feedback

NNS response

NS-2 stimulated NNS response recall prompts

People is fishing on the beach People are what? Fishing

Because the grammar is not correct How is it not correct? (replays video) I said people is fishing but I have to say people are fishing

74

Characterization of Stimulated Recall

EXAMPLE 3.11

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

NNS original comments

NS-1 feedback

NNS response

NS-2 stimulated NNS response recall prompts

He’s taking picture to his child He’s taking a picture of his child Yeah

I think the preposition is not suitable Is that now? Or then? No idea

Yeah Can’t notice back then

EXAMPLE 3.12

NNS original comments

NS-1 feedback

NNS response

NS-2 stimulated NNS response recall prompts

Two people, three people is in the sea All three people are in the sea The sea

I used I said people is three people is in the sea but I have to say people are Did you understand that then?

Yeah?

Characterization of Stimulated Recall

75

Below, we present general recommendations regarding preparation, particularly with regard to the time that will be needed to conduct your experiment.

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

RECOMMENDATIONS: TIMING t

Carefully pilot all instruments to ensure accurate time estimates. t How long will the initial data collection (from which the recall will be derived) take? t How long will the recall take? t How long does it take to set up the task? Consider the transition. Allow equal amounts of time for all participants across topics or episodes. In the analysis, take into account individual characteristics of verbosity. Make sure that the recall questions focus on the timeframe in question.

t t t

Conclusion This chapter has gone into specific detail about ways to carry out stimulated recall. In addition to the detailed step-by-step guide, many of the potential pitfalls involved in carrying out a stimulated recall methodology were discussed. We have laid out guidelines for preparing and conducting a study using stimulated recall and have shown how mistakes can be avoided with careful planning and by always keeping in mind that the thoughts being asked about are the thoughts at the time of the original event. In the next chapter, we deal with coding and analysis of data.

Notes 1 That the use of tenses can be misunderstood is apparent in data from Naylor (n.d.) who worked on a court case that rested on the inappropriate use of and understanding of how the English tense system works. The consequences of the case were far more serious than the consequences of misinterpreting data in a stimulated recall. Nonetheless, the example illustrates how the use of tense can easily be misinterpreted even by reasonably fluent individuals. Naylor, a NS of Tagalog, was called by the defense in a trial in which two Filipino nurses had been accused of murdering patients under their care. It was felt that the nurses, in trying to defend themselves during the trial, had contradicted themselves, making them easy targets for conviction. An example from the transcript of the trial is given here: Question:

Would you say that the two of you were close friends during that period of time?

Response:

I would say we are good friends but we are really not that close because I don’t know her and don’t know each other that much.

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

76

2 3 4 5

6

Characterization of Stimulated Recall

Question:

And what else did you learn about Pavulon, other than it was given at surgery?

Response:

Are you asking me about what I know about Pavulon in the summer of 75 or what I know about Pavulon at the present time, after hearing all these experts?

Question:

What you knew about Pavulon at the time.

Response:

I know a little about Pavulon.

(It is to be noted that in Tagalog, tense and aspect are not marked on the verb. Thus, eats, is eating and was eating are all expressed by the same verb form.) With regard to the example, the statement might have been true in the past (i.e., they were not good friends at the time of the incident), but the use of the present tense made this response contradictory. Similar misuse of the present tense can be seen in the exchange concerning the drug Pavulon, the drug supposedly used to kill the patients. The parallels between this example and the use of stimulated recall when the language of the recall is not the native language of the participant are clear. They both assume that the participant has native-like control over the target language tense system and therefore can accurately understand and produce a system that differentiates between the time of speaking and the time of thinking. Note that the use of the word protocol here, as elsewhere in this chapter, refers to a set of instructions, parameters, and details for carrying out an experiment, as opposed to a verbal protocol or report. A relatively frequent scenario in stimulated recall involves participants talking over a tape, even when they have been directed about how to turn on/off the pause button. We are simplifying this discussion and ignoring further complexities, such as other data being collected. An alternative in terms of time could be the scenario in which two videotapes of the interaction were simultaneously dubbed and two NS researchers carried out the recalls simultaneously. This avoids the 30 minutes downtime for one learner. Unfortunately, such resources are beyond the reach of many researchers. Although it might be interesting to have participants reflect on the differences between what they thought at the time and what they thought later, this was not the focus of the study.

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

4 DATA ANALYSIS

Once data have been collected, as with any experiment, researchers need to determine how best to prepare the data for analysis. In this chapter we deal with the basic issues of stimulated recall analysis. We focus on reliability, training raters to code data, actual coding and layout of data, as well as issues of analysis. We provide the reader with a number of practical examples from research that relate to timing and working with coders. In doing this, we relied exclusively on one study we conducted, mostly because we have insider knowledge of our own research and we are unaware of any other documentation on the “behind the scenes” nitty-gritty of data analysis.

Interrater Reliability As with most data, there are several steps that need to be considered when analysis of stimulated recall data is carried out. These steps include transcription, coding, and description of data, as well as analysis per se. Analysis of data obtained through stimulated recall can be qualitative, quantitative, or a combination of both. As with most data, the issue of interrater reliability1 must be considered in relation to transcription and coding of data obtained through stimulated recall. One of the most important issues when carrying out tests for interrater reliability with stimulated recall data is the objectivity of the raters. Often, those who collect the data are the researchers, and usually the researchers transcribe, code and rate, or analyze the data as well. However, this high level of involvement by the researcher may cause problems with the analysis of such data. There is often a relatively high level of interpretation in relation to data obtained through stimulated recall, and interrater reliability is likely to be affected. Consider, for example, the case found in data collected by Mackey and Gum (1997): A

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

78

Data Analysis

participant had taken several turns explaining to the researcher that she ignored the teacher’s request for students to read a passage because she was bored by the passage and felt the teacher was not in tune with her L2 learning requirements at that time. The recall data for this study included audiotaped interviews with students who, as a stimulus for the recall interview, were shown a videotape of themselves in class where one of the researchers was also their teacher. One goal in collecting these data was to discover more about students’ motivations for not following instructions in class and whether teachers’ status as a novice or institutionally experienced teacher could be, at least in part, a determining factor in whether or not a student followed a teacher’s directions. Example 4.1 contains an extract from a transcript of one of the interviews between the student in question and a researcher. An independent rater not present during the interaction who read the transcript and rated this interaction was given three categories for coding, which were predetermined by the researchers. These categories were later determined to be somewhat problematic, and the analysis was not pursued further. However, for the present purpose of illustrating interrater reliability issues in recall data, they work well. a b

c

Participant did not agree with teacher’s instruction (e.g., student believed that the use of class time for silent reading was not an effective use of time). Participant did not disagree with the instruction but did not want to comply with instruction (for reasons such as student preferred to read another passage or do other activity). Participant agreed with teacher’s instruction and carried out activity.

EXAMPLE 4.1

(from data collected by Mackey & Gum, 1997)

Researcher:

Can you tell me why, at that time, what you were thinking while you wrote in your notebook? What was going through your head? On the tape we can see that you looked at the teacher while she said, “Please read the passage on page 36. Take ten to read the passage.” on the tape we can see you wrote in your book for ten minutes or so. What were you thinking?

Student:

I already read it last time.

Researcher:

Oh. What were you thinking?

Student:

You know, I thought I read it. I don’t need to read it again.

Researcher:

So that’s what you were thinking.

Student:

I thought as I already read it so I’ll write out my vocabulary for X’s (teacher’s name removed) class after this. Then I am making a good use of my time.

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

Data Analysis

79

After reading the transcript, the independent rater coded the students’ response as category (b). From the transcript it seemed to this rater as though the learner simply preferred to work on her vocabulary, an alternative task. However, if the rater had been present during the interview or had been able to watch a videotape of the interview rather than read a transcript, the rater would have heard back-channels and intonational cues, for example, a strong emphasis on the word already, and drawn-out sighs of displeasure that accompanied this student’s verbalization. The student also made a nonverbal hand gesture that could be interpreted as related to the wasted time or effort she attributed to rereading passages. She also spelled out an L for loser with her thumb and index finger; this gesture was not indicated on the transcript of the recall but was recognized by the nonindependent rater-researcher because she had heard other students discussing the gesture after a recent movie they had all seen. Together, these behaviors led the rater-researcher, who watched the recall stimulus and took part in the interview as well as transcribed and coded it, to rate the interaction as category (a). This lack of agreement can be seen as an example of inadequate transcription because not indicating nonverbal and backchanneling behavior results in an incomplete transcript. It is also problematic in that the coding system did not allow overlap between categories. Most important, however, the lack of agreement between raters is related to the complex nature of the recall procedure. In this case, the interrater reliability was affected because one rater brought prior knowledge of the recall stimulus to the rating process and the other rater relied only on the transcription of the recall comments. Thus, the incomplete nature of the transcription and the lack of representation of the stimulus combined to produce different ratings. In some cases, researchers want both raters to be in possession of all potentially influencing information. In other cases, only what was present in the recall comments should be rated. Obviously, the nature of the research question will determine the content of the data to be rated. Researchers who were participants in the stimulus, the original activity to be recalled, or who took part in the recall interview with the participant often have extra insight into the learners’ verbalizations. In Example 4.1, the researcher’s perceptions about the data were the accurate ones, as seen in the nonverbal cues and the data recalled. However, the main motivation for assessing interrater reliability is to ensure that researchers do not read too much into the interaction. This can happen because researchers may become over-influenced by their research questions and hypotheses and therefore by their expectations about the data. Tests of interrater reliability allow us to have some confidence that what we as researchers see in the data can also be seen by independent raters, or in the case of intrarater reliability, that we as raters are rating each episode consistently. Thus, after carrying out a stimulated recall, we generally recommend finding independent raters who were not participants in the original event to be recalled and who were also not participants in the subsequent recall, or, if this is not possible, finding raters who participated in the same events, either the original or the recall.

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

80

Data Analysis

Independent third-party raters will need to be trained in the categories for rating and given the information they need about the stimulated recall procedure and events being recalled. In some designs, raters will need to see the stimulus as they rate the recalls; in other designs this may influence their perceptions about the recall in an undesirable and nonobjective way. In this latter instance, raters will need to rate the recall in isolation. Thus, the data for the interrater reliability test need to be chosen and ordered carefully. All this requires rather more work than usually necessary in many L2 studies. Generally for interrater reliability tests, L2 researchers may simply pull out a random subset of the data, write up a sheet on coding categories, and have the raters turn in their ratings. A simple percentage agreement check on the basis of the researchers’ coding can be calculated. To illustrate what we mean by a careful interrater reliability check, we now turn to an extended example. EXAMPLE 4.2

(from Mackey et al., 2000)

NNS:

[st‫ܤ‬z], [st‫ܤ‬z]

NS:

Stars? Oh, stairs?

NNS:

Yeah

Recall:

I know I can’t pronounce this word. Oh, no, I need to tell her this word. Inside me, I was laughing, oh my god, just my luck.

EXAMPLE 4.3

(from Mackey et al., 2000)

NNS:

Have a wings.

NS:

The bird has wings?

NNS:

Yeah

Recall:

Maybe she is not sure which bird I saw. She wonder if my bird has wings, she ask me “the bird in your picture has wings?” I like her question because I think sure, all birds have a wings.

In the study of oral interaction we carried out with Kim McDonough (reported in Mackey et al., 2000), learners carried out tasks and interacted with NSs. Non-targetlike utterances (learner errors) that triggered feedback episodes were categorized as lexical, phonological, morphosyntactic, or semantic errors. NS feedback provided was also categorized (and in almost all cases was based on the error type such that a morphosyntactic error generated morphosyntactic feedback; occasionally, multierror sequences occurred and generated either single or multierror feedback, again almost always agreeing with one of the error types). Next, learners’ perceptions about the feedback they received were categorized in the same way. The aim of the study was to see if learners were accurate in their perceptions. For example, was morphosyntactic feedback perceived as such by learners?

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

Data Analysis

81

In relation to the interrater reliability for this study, it may be obvious that the classification of the initial error and the feedback provided should not be allowed to influence an independent rater’s classification of the learners’ perceptions. Thus, in Example 4.2, the learner makes a phonological error, pronouncing the word [st‫݋ܭ‬z] ‘stairs’ as [st‫ܤ‬z]. The NS provides implicit negative feedback, giving a recast of the correct pronunciation of the word. The error was phonological, and the feedback provided to the learner was also phonological. The stimulated recall comment showed that the learner also perceived the feedback as phonological; the learner even mentions pronunciation. An alternative can be seen in Example 4.3, where the error was morphosyntactic, the feedback was morphosyntactic, yet the recall comment was focused on meaning. In this case, how would independent raters classify the recall? If they saw the interaction preceding it, they might classify it as morphosyntactic, as it was intended by the NS. However, the rater was supposed to be classifying the learner’s perception about the recall, and the learner, although in possession of the knowledge of what she had originally said, may or may not have related this to the researcher’s comment. So, her feedback should have been classified as semantic, not morphosyntactic. Because they are learners of English, participants may differ from researchers in what they understand about what a NS has said. Thus, it is crucially important that just because the rater may understand that the feedback provided by the classification of the NS was intended to be morphosyntactic, the rater’s understanding should not influence the rating about the learner’s perception. Mackey et al. solved this problem by ordering the ratings. Two independent non-researcher raters both rated half the data. However, they rated the interactions and the recalls separately so that they did not see the interactional feedback and were not influenced by the NS’s feedback type. The learners’ recalls and the interactions that preceded them were rated independently. A reliability check was carried out between the two independent raters and between their ratings and the ratings of the researchers who carried out the study. In concluding this section on reliability for coding and analyzing stimulated recall data, we make the recommendations shown in the box.

RECOMMENDATIONS: INTERRATER RELIABILITY t t

t

Use objective non-researcher raters for checks wherever possible. Carefully construct analytical categories, checking subsets of the data with other researchers before coding all data to avoid carrying out large amounts of high inference coding that cannot be replicated. Decide on nature, content, and order and presentation of data for intercoder checks, paying attention to the potential effects of the stimulus along with recalled comments that need to be rated.

82

Data Analysis

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

Having addressed the issue of reliability of the coding of stimulated recall data, we now turn to the issue of rater training. First, we present the protocol used, which includes familiarizing raters with the constructs under investigation. This is followed by a presentation of the actual coding sheet.

An Extended Example of Rater Training In Figure 4.1a, we present the training protocol that was used for the Mackey et al. (2000) study to provide some idea of the extent of training that may be necessary. The training sessions took place over a three-day period, with extensive explanations of concepts and categories as well as numerous opportunities to view examples. Figure 4.1b is an example of the coding sheet used to code the stimulated recall data in the study. Raters were given preselected parts of the stimulated recall interaction and had to code the stimulated recall. Another part of the training involved coding the feedback provided to the learner. Figure 4.1c shows an example of the coding sheet used for the interaction part of the study. When coding the interaction, raters were not given preselected episodes to comment on; rather, they had to select the episodes, transcribe that portion of the tape, and then code the episodes. Following coding of the NNS utterance, as in Figure 4.1c, we need to understand and code the content of the comments made by the NNS about the feedback. The coding sheet for this part of the training is given in Figure 4.1d. Because stimulated recall coding is complex, involving many participants doing many different parts, it is important to have a schedule so that everyone knows what she or he is doing. Figure 4.1e shows the rater and task schedule that was developed as an organizational tool to ensure that all tapes were evaluated. Another important schedule is the one based on each individual learner. Figure 4.1f is a training schedule to ensure that all interactions and stimulated recall were evaluated and that raters had an equal number of evaluations to do. This helps everyone know what to do for each individual learner. Finally, Figure 4.1g is the schedule used for interrater reliability.

Data Layout and Coding The specification of what to code is, of course, a function of the specific research questions addressed by the study. Nevertheless, there are some general issues related to the coding of stimulated recall data, and it may be helpful for researchers to be aware of these prior to developing a coding system. Coding sheets for stimulated recall differ from many other types of coding sheets in that one must keep track of two different events. In Figures 4.2 to 4.5 we present coding sheets from Hawkins (1985).

Data Analysis

FIGURE 4.1a

Sample training protocol

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

Day 1 (20 minutes)

(30 minutes)

(30 minutes)

(45 minutes) (2 hours)

83

Interaction Training Review What is a non-targetlike utterance? Inside the bookshelf, there are books (choice of preposition) The man walk a dog (S-V agreement) There are two cup on the table (plural) I can see a /crowd/ in the sky (pronunciation) The cat is fixing at me (lexical choice) What is interaction? What is negotiation? What is implicit negative feedback (INF)? NNS: the third one is a /dawk/ NS: a what? NNS: a dog What is a recast? NNS: he also standing NS: he’s also standing? NNS: yeah What we aren’t looking at (but does occur in the data): 1) explicit explanation of lexical items NS:What color are his fins? NNS: Huh? NS: Fins are like, you know people have arms but fish have fins 2) explicit feedback NNS: The cat is playing with wool NS: Not wool, you mean yarn 3) provision of form NNS: It is in front of the fire … fire … fire… NS: place NNS: fireplace Watch Training Video 1 Trainers use remote, stop and pause at relevant examples of what is INF and what is not; stop and talk throughout video Watch Training Video 2. Watch 3–5 minutes, take notes individually, no need to write down complete interaction. Pause and discuss. Continue with another 3–5 minutes. Rating Session 1 Bev: 1 interaction; Lea: 1 interaction Trainers review ratings, discuss, further training where necessary continued …

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

84

Data Analysis

Figure 4.1a continued Day 2 (40 minutes) Stimulated recall training What is stimulated recall? Watch video of AB doing recall with K What was the prompt? Classify comments based on content 1. Lexical: specific comments about a known or unknown word, including provision of a synonym and comments about a synonym, for example I am not sure about feminine; I have never heard this word before 2. Semantic: general comments about communicating meaning, creating understanding, being unable to express an intended meaning, providing more detail or elaboration, for example she still doesn’t understand; she wants to know more detail; we move to focus on bookshelf 3. Phonological: specific comments about pronunciation, for example I think my pronunciation sounds strange; I have a problem with the /l/ sound 4. Morphosyntax: comments about sentence formation and structure or word order, comments on specific aspect such as S-V agreement, tense, for example because the grammar is not correct; I said people is fishing but I have to say people are fishing 5. Ø: The participant has nothing to say, for example Um, no nothing really; It was ok; No, I don’t remember anything 6. ?: The participant made some comments about specific content, but the rater cannot classify those comments into a particular category, for example my answer is not suitable; I was thinking that the room looks like my dorm room Potential problems, resolution mechanism Practice Rating Learner 1, Learner 2 Rate one stimulated recall each (2 hours 20 Rating Session 2 minutes) Bev: 2 recalls, 1 interaction Lea: 1 recall, 2 interactions Day 3 (3 hours) Rating Session 3 Bev: 2 recalls, 3 interactions Lea: 3 recalls, 2 interactions

Data Analysis

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

FIGURE 4.1b

85

Sample coding sheet for stimulated recall comments

You will be reading some comments that the learner made while the videotape was paused during the stimulated recall procedure. The learners were explaining what they were thinking while the interaction was going on. After you read each comment by the learner, classify the content of that comment as one of the following (see explanations on the training sheet): Lexical

L

Phonological

P

Morphosyntax

MS

Semantic

S

No content

Ø

Unable to classify

?

Transcription conventions: underline = emphasis was placed on word by participant. #

Stimulated Recall Comment

1

Sometimes I couldn’t catch opponent pronounce but sometimes I can guess opponent question but at this time I couldn’t understand

2

It’s difficult for me to explain about weather. I know it’s rain, it’s snow, or it’s fine, sometimes for now, it’s not snow, it’s not rain, how can I say? Yeah, I had better study about weather’s vocabulary

3

I just said right side but on the right hand side, is it correct?

4

No, nothing, it’s ok

5

Oh, it’s ok

6

Sometimes confusing left and right

7

It’s ok

8

At that time I couldn’t explain about the road. How can I say this?

9

The boy bring one stick, uh this picture also the boy bring a stick, but it’s oh different. She said he doesn’t bring a stick maybe. At that time she explained the boy doesn’t have a stick

10

Uh, I have better explain more detail

11

Yeah, at that time it’s so difficult to explain about between something to something

12

At that time I couldn’t catch total

13

I wrong to explain. I just I say between the trees but I have to say between two trees

14

My pronounce is not clearly, especially /r/ and /1/

15

We discuss about more detail, we focus on boots

Type

86

Data Analysis

FIGURE 4.1c

Sample coding sheet for interaction episodes

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

When you watch the video, pause every time the NS gives implicit negative feedback to the NNS. Write down what the NNS said and what the NS said in response. Then classify the type of error in the NNS utterance as one of the following: Semantic

(problem with meaning)

S

Lexical

(problem with lexical choice)

L

Phonological

(problem with pronunciation)

P

Morphosyntactic

(problem with grammar)

MS

#

NNS Utterance

FIGURE 4.1d

NS Response

Type

Sample coding sheet for stimulated recall comments

#

Stimulated Recall Comment

Type

1

I am thinking where is in front of. In front of means this way and I was thinking this place was in front of. I am thinking I could say it’s in front of

2

I was confused about below because I don’t usually use above, below, I use top

3

I can not easily, right left, if you right side, I was thinking which one

4

I was thinking the name of thing, I don’t know the word

5

I was very happy because I didn’t know fireplace and I was explaining it look like and she know it

6

In Japanese I call cushion, but I didn’t know English

7

I didn’t know couch and chair different and when I when I said two couches she didn’t understand what I said and I thought I don’t know the meaning and I asked how couch mean and she explained. I understood I felt comfortable because there I didn’t understand and I couldn’t understand

8

I was thinking why she didn’t understand

9

My my English is not understandable to native speaker, same as always

10

I didn’t know what cage is I was going to explain it look like jail but I didn’t, she understood continued…

Data Analysis

87

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

Figure 4.1d continued #

Stimulated Recall Comment

Type

11

Sounds like John, John is my content teacher, the man in the picture looks like John

12

I said wrong vocabulary

13

I was thinking how I could explain because I don’t know how to form look with girl, but I didn’t explain and then she said

14

She asked I have or not, I could explain none of them because it was on the floor another one is not on the floor so I couldn’t explain where it is

15

I said couch but my is not couch so I thought I made a mistake

16

When I’m talking to.native speaker and the person doesn’t understand my English, I always say why this guy doesn’t understand me

17

I said piece of cheese but I have to say a piece of cheese, I thought next time I was going to say a piece of cheese

18

Because new vocabulary

Rater and task schedule Key: first letter represents L1 (Thai, Japanese, Mandarin, Korean, French); second letter represents gender; SR = stimulated recall coding; Int = interaction coding.

FIGURE 4.1e

Day

Rater

Learner

Task

Day l

Bev

1. JF1

Int

Lea

l. MM

Int

Bev

1. MM

SR

2. FF

SR

3. JM2

Int

4. FM1

SR

1. JF1

SR

2. FF

SR

3. JM2

SR

4. FM1

SR

5. TM

SR

6. JM1

SR

7. KM

SR

8. FM2

SR

9. JF2

SR

10. MM

SR

Day 2

Lea

continued…

88

Data Analysis

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

Figure 4.1e continued Day

Rater

Learner

Task

Day 3

Bev

l. TM

Int

2. JM1

SR

3. KM

Int

4. FM2

SR

5. JF2

Int

1. FF

Int

2. FM1

Int

3. KM

Int

4. FM2

Int

5. JF2

Int

6. MM

Int

7. JM2

Int

8. TM

Int

9. JM1

Int

Lea

FIGURE 4.1f

Training schedule

Learner

Interaction rater

SR rater

MM

Lea

Bev, Lea

JF1

Bev, Lea

Lea

FF

Lea

Bev, Lea

JM2

Bev, Lea

Lea

FM1

Lea

Bev, Lea

TM

Bev, Lea

Lea

JM1

Lea

Bev, Lea

KM

Bev, Lea

Lea

FM2

Lea

Bev, Lea

JF2

Bev, Lea

Lea

FIGURE 4.1g

Schedule for interrater reliability

Stimulated Recall Coding

Interaction Coding

MM

JM

FF

KM

FMI

JW1

JMl

JF2

FM2

JM2

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

Data Analysis

89

These coding sheets display the range of relevant coding. As discussed in Chapter 2, one purpose of Hawkins’ early study was to compare the retrospective commentaries by two participants in a conversation, one native speaker and one learner. Hence, Figures 4.2 and 4.3 are to be read together. The first represents the conversation from the NS point of view and the second the same conversation from the learner’s point of view. In the leftmost column is the portion of the data that the stimulated recall focused on. In the second column is the mechanism used (e.g., repetition, comprehension check). The third column provides the reader with information about the topic category, that is, whether it is a new mention (e.g., topic nomination), topic identification (e.g., mention of object function), and so forth. The fourth column identifies the conclusion (AR = appropriate response; C = comprehension) concerning the research project. The fifth column shows the benefit that a particular utterance might have for the possibility of NNS comprehension, and the sixth column identifies the strategy that the NNS used to indicate comprehension or lack of comprehension and the extent to which this strategy affected the NS’s realization of the NNS’s comprehension or lack thereof. Finally, in the last column are the retrospective comments, carefully numbered and lettered to match the original transcript. Figures 4.4 and 4.5 are from two different NS–NNS pairs and further exemplify the richness of the data that Hawkins (1985) described. Figures 4.6 and 4.7 are two examples of a second coding chart. The first example shows an earlier version of the second example. These sheets were developed for the Mackey et al. (2000) study discussed earlier. We present these two coding sheets as a way of elucidating the problems that an inappropriate coding sheet can bring. The researchers wanted to determine the extent to which learners recognized the purpose of implicit negative feedback. Figure 4.6 is the preliminary version, and Figure. 4.7 is the actual version used in the study. In order to address the research question, the researchers had to know what the original error type was (see the previous section on interrater reliability). In the first sheet, the researchers failed to anticipate the need for coding the type of error and therefore did not leave a column for that information. This was recognized during the coding process, and the sheet was revised. The second column in the revised coding sheet, for reasons of space, eliminates the four choices of error type (i.e., lexical, phonological, morphosyntactic, semantic), allowing the researcher to write the full example. The other major change is the addition of the stimulated recall type column (and the concomitant reduction of the four columns) rather than trying to incorporate so much information into a single set of columns, as in the original. This was an important change because at the coding stage, the researchers needed to obtain interrater reliability on precisely these categories. Had they been written in on the coding sheet, it would have been more difficult to obtain interrater reliability using these sheets. For the Mackey et al. (2000) study, the final column was added to address a specific research question.

FIGURE 4.2

Transcript 1 from Hawkins (1985)

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

FIGURE 4.3

Transcript 2 from Hawkins (1985)

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

FIGURE 4.4

Transcript 3 from Hawkins (1985)

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

FIGURE 4.5

Transcript 4 from Hawkins (1985)

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

M/S

Sem

Feedback

Learner Response

Preliminary (flawed) version of coding sheet

Four choices for targeted error is not efficient. Need one column. Need space to classify each stimulated recall error in addition so space to transcribe each one. No column for noting initiator of episode. No column for noting learner uptake at the time of the interaction. No space for participant ID or identification of session.

Phon

FIGURE 4.6

  

 

Problems:

Lex

Targeted Error Lex.

Phon.

M/S

Stimulated Recall

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

Sem

Targeted error

Feedback

Learner response

Stimulated recall

Final version of coding sheet

Is the error column (and category) for the transcription of the error or the classification of the error? Need two columns. How is uptake classified? Again, is the response column for transcription of response or classification of uptake or both? Unclear if this coding sheet can account for more than one turn or one sequence involving multiple turns.

FIGURE 4.7







Remaining problems:

Error type

Participant ID/Name__________ Initiated by participant or researcher SR type

Uptake

Data of data collection_____________

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

96

Data Analysis

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

Other columns that reflect specific research hypotheses can be included, as was the case with the Hawkins study. In concluding this section, we present some recommendations for coding sheets.

RECOMMENDATIONS: CODING t t t

Ensure that coding sheet allows categorization of types of data required to address research questions. Design sheet so that there is easy visual access to simultaneous events. Order sheet so that chronology of non-simultaneous events is easy to follow.

Analysis of Stimulated Recall Data Following the discussion concerning the presentation and coding of data and potential pitfalls in carrying out the procedure, we now turn to the analysis of stimulated recall data. Once again, it must be recognized that the analysis is, like the coding categories, highly dependent on the research questions and hypotheses of the specific L2 study. Therefore in this section we present some general issues, using examples from previous studies. We finish with some recommendations. As pointed out in Chi’s (1997) detailed article on the analysis of verbal data, quantifying what is usually perceived as subjective coding of the contents of verbal utterances can be challenging. Procedures including tabulating, counting, quantifying, and drawing inferences about relations among different kinds of utterances are often carried out in an attempt to reduce the subjectivity of qualitative coding. Quantification of qualitative data, such as a person’s introspective comments about L2 writing revisions, is very different from quantification of words, clauses, phonemes, grammatical and ungrammatical sentences, t-units, grammatical structures, or many other kinds of L2 data. Analysis of stimulated recall interviews (or other kinds of introspective recall data such as diaries) usually involves quantifying categories where there is no direct one-to-one correspondence between the category and the verbalized data and some inferencing or judgment is needed for classification. In some designs, the highly subjective introspective data need to be linked with objective, sometimes performance-based data. For example, in diary studies that explore issues of noticing (Schmidt & Frota, 1986), introspective comments about language recorded in a diary are linked to objective performance in terms of production of the items recorded in the diary. In such coding schemes, challenges to reliability can be especially high. Many stimulated recalls in L2 studies are carried out in order to explore issues about learners’ cognitive processing or their use of communication strategies, or sociolinguistic issues related to their L2 use. In analyzing the recall data, the purpose is to identify

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

Data Analysis

97

and classify the verbalizations that shed light on the phenomena being investigated, yet to capture them in a way that is as low in subjectivity as possible. When analyzing stimulated recall data, all the usual caveats in terms of carrying out analyses that are solely quantitative or solely qualitative apply. Many studies benefit by using one approach to supplement the other where possible. Although qualitative analyses can provide a much richer understanding of processes and situational knowledge, controlling for variables that affect the data is impossible, and the analysis is often subjective and nonreplicable. Quantitative analyses on the other hand, are usually objective and replicable but generally only applicable to the specific hypotheses addressed in the study, and problems of generalizability may apply. Thus, when analyzing stimulated recall data, combining analytical techniques is desirable wherever possible. The shortcomings of one type of analysis may be addressed through the strengths of another. There are various ways to combine analytical methods for stimulated recall data. For example, the qualitative data can be used to shed light on the findings of any quantitative analysis. Examples can be found in a study of problem solving by Chi, Feltovich, and Glaser (1981). They asked experts and novices to categorize problems according to their own categories. The categorization patterns were quantitatively analyzed using factor analyses. Participants’ explanations for why they chose particular categories were also examined, and these data were used to interpret what the different categories meant to experts and novices. However, as noted by Chi (1997), no main claim was made on the basis of the qualitative data; the emphasis was on the quantitative data. Another study that uses qualitative data to further interpret quantitative analysis is that of Poulisse, Bongaerts, and Kellerman (1987), who studied communication strategies using retrospective reports and included both a quantitative section that compared categories and a qualitative section that discussed the influence of researcher bias. Finally, Gass (1994; described in greater detail in Chapter 2) used a stimulated recall design to aid in the interpretation of a primarily quantitative study. She gathered quantifiable data from learner judgments of sentence acceptability. Judgments were gathered on a seven-point scale and were collected two times within a one-week time period as a way of gathering information on reliability. When judgments seriously differed (e.g., from one end of the scale to another) across the time frames, Gass used recall data in an attempt to explain these differences. Another way to integrate quantitative and qualitative methods in stimulated recall data is to quantitatively analyze categories that are essentially qualitative in nature but for which careful operationalizations have been made and for which high interrater reliability has been obtained. An example can be found in the Mackey et al. (2000) study, which is discussed at length in the section on interrater reliability. Perhaps the most common integration of methods is to use two analytical techniques side by side. For example, Bosher (1998) studied the composing

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

98

Data Analysis

processes of three ESL writers and carried out quantitative and qualitative analyses of aspects and strategies in their recalls, as well as “themes which emerged from the data” (p. 214). Whichever specific analytical technique or qualitative–quantitative combination is selected, there are certain steps that must be taken in carrying out analyses. We have revised a subset of the techniques put forward by Chi (1997) so that they specifically relate to stimulated recall protocols. The steps are discussed in the following sections. It should be noted that the obvious first step is to transcribe and lay out the data, as was described earlier.

Sampling the Recall Data The first step involves deciding how much or how many of the recalls need to be analyzed. For example, it is not uncommon in the L2 research field to analyze a subset of an interview, such as the middle 10 minutes of a 30-minute interview. Transcribing, coding, and analyzing all 30 minutes is time-consuming. It may be unnecessary if ten minutes is a representative sample of the learner’s language. Why choose the middle ten minutes to sample? The first ten minutes may be affected by the participant settling down, getting comfortable, and the researcher chit-chatting to warm the learner up and minimize any initial nerves. The final ten minutes may be their winding-down time: The learner may be tired; the interview may finish early. The central ten minutes will often contain the richest and arguably the most representative data of a particular L2 learner’s developmental level. Of course, some studies sample the first ten minutes, or the first and last five minutes. A study of L2 phonology, for example, may focus on the change in vowel quality over the course of an interview and sample all instances of one vowel sound throughout the entire time. Other types of sampling involve making decisions about how much data from each participant to study. If, for example, five recalls exist at different times for one participant, should they all be used, or will two be sufficient? Which two? If an entire class of 30 students carries out a recall, should they all be analyzed, or is half the class, selected at random, sufficient? Do all the data contained in the recalls need to be analyzed, or just specific responses that relate only to the research question? If sampling is carried out, how could it affect the analysis as a whole? Whether and how sampling of the data is carried out obviously needs to be decided based on the resources available for the study and the specific research questions addressed by the study.

Preparing the Data for Coding Preparation of the data for coding is obviously research-question dependent. If a recall of oral interaction data is to be carried out, the specific interactional features

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

Data Analysis

99

that the coding will isolate need to be selected. This may involve categorizing the data into episodes in the first instance. It may involve separating the data into segments based on turn boundaries or idea units. Alternatively, if a feature such as strategy use is the focus of a study, the data may need to be segmented into units in which each mention of a strategy type is categorized. Obviously, this advice needs to be considered in the context of the research question. Often, the simple segmentation of the data represents the same step as coding. In other cases, the data need to be prepared for coding through segmentation. It can be helpful to think of the notions of segmentation and coding as separate in that segmentation is a much broader category than many coding schemes. It can be thought of as a preparatory stage, in the sense that it avoids having to sift through unnecessary verbalizations during what is probably already painstaking coding. It can also give a useful idea about subjectivity in the dataset. If two researchers find it difficult to agree about broad segmentations in the dataset, this may provoke some early questions and assessment in relation to the categorizations. Generally, although not always, if broad categorizations are difficult to define, more finely grained coding schemes based on the broader units will be even more problematic.

Developing a Coding Scheme Coding stimulated recall data has been discussed previously. Suffice it to say that coding schemes will be dependent upon the questions and hypotheses to be addressed in the study. In coding stimulated recall data, what often happens is that a top-down coding scheme is developed on the basis of a central research question. This can then be fine-tuned after a first pass through the data. In coding stimulated recall data, one often needs to be flexible, as the data can be unpredictable. Thus, coding schemes need to be prepared with the possibility of change and revision in mind. For example, although coding may be first visualized as counting morphosyntactic structures, an analysis of semantic units or discourse units may shed more light on the relations among categories or perceptions. Studies have shown high levels of interrater reliability in the coding of introspective verbal data (cf. Bettman & Park, 1979); For further information, see the discussion of complementary coding schemes in Elstein, Shulman, and Sprafka (1978) and our review of concerns about reliability in Chapter 5.

Analyzing and Describing the Data Having coded the data and considered issues such as interrater reliability and the method of analysis, most researchers consider additional issues such as statistical testing where appropriate, interpretations of the data, and ways to provide descriptions of the data. Whether the data are depicted graphically, pictorially, in a tabulated form, or through example sets will again depend on the

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

100

Data Analysis

research question and on the patterns that need to be illustrated. Although these steps sound straightforward, finding and demonstrating patterns in stimulated recall interviews can be challenging. If the data are coded into categories that can be quantified and the results tested for significance and perhaps graphically illustrated through charts, pattern identification and illustration is relatively straightforward. However, in many purely qualitative analyses, models, perhaps illustrating links, may need to be used. It is often said that researchers who work in a qualitative paradigm need to be able to tell a story convincingly and distinctively. This certainly applies when finding and illustrating patterns in stimulated recall data. When depicting data, some studies supplement or triangulate stimulated recall data. For example, they may supplement the recall data with performancebased data. One example of a study that uses more than one type of recall data was carried out by Russo, Johnson, and Stephens (1989). Admittedly, their study is a comparison and test of reactivity and veridicality of different oral protocols, yet they triangulate data using different tasks and both on-line verbal protocols (i.e., think-alouds) and stimulated recalls supported by different levels of stimuli.2

Conclusion This chapter has provided detail on rater training, data coding, and data analysis. In the following chapter, we deal with limitations of stimulated recall and also suggest ways that stimulated recall may be useful in supplementing other types of data (a topic we also discuss in Chapter 3).

Notes 1 For basic information about interrater reliability coefficients, see Mackey and Gass (2015). 2 One interesting interpretation they made based on their study was that care must be taken to ensure that people view thinking aloud as secondary to the performance of the task at hand, such as problem solving. Training and the use of warm-up tasks may be helpful in this respect. It is not a problem that needs to be addressed by stimulated recall studies because participants are generally not told that they will be recalling the data, so there is no interference.

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

5 USING STIMULATED RECALL AS AN ADDITIONAL DATA SOURCE

In this chapter we turn to an exploration of the ways in which stimulated recall methodology can be used in empirical L2 research. In the first edition we did this as a way of demonstrating the potential of what was then a littleknown methodology for shedding additional light on issues that may remain unresolved through empirical L2 data alone. The sheer plethora of stimulated recalls in L2 research since then shows us that applied linguists have adopted the methodology in ways that we did and did not conceptualize in our initial work more than a decade ago. We now address options we did not initially think about, for example, stimulated recall data used to triangulate eye tracking data.

Data Triangulation The triangulation of stimulated recall data with other measures of noticing is useful in mitigating potential limitations of the methodology. A study by Ziegler and Mackey (2014) sought to evaluate the differential effects of pre-task planning on L2 performance by focusing on learners’ production in synchronous computer-mediated communication (SCMC). Forty-four intermediate learners of English participated in picture-narrative tasks in which they had to work with a peer interlocutor to co-construct a story based via text chat. All learners completed three versions of the task differing in pre-task planning times including: no planning time, one minute of planning time, and three minutes of planning time. Results demonstrated that three minutes of planning time resulted in increased complexity, but not accuracy or fluency. Screen capture and mouse tracking technology allowed researchers to better understand how learners utilized their planning time; however they had no insights into what

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

102

Using Stimulated Recall as an Additional Data Source

learners were thinking when they made these decisions. Stimulated recall interviews would have helped to triangulate these data and offer insight into why learners performed certain actions. Godfroid and Schmidtke (2013) utilized eye-tracking data and vocabulary learning scores in conjunction with stimulated recall interviews to assess receptive vocabulary learning. Twenty Dutch EFL learners read paragraphs written in English including novel pseudowords as an eye tracker recorded eye movements. Afterward, participants completed an unannounced vocabulary posttest in which the original sentences containing novel words appeared with possible answers for the missing pseudoword. Finally, a stimulated recall interview was conducted using the results of the posttest. Researchers found that a participants’ total fixation time on a pseudoword (as measured by eyetracking data) coupled with recollections of reading the word (as measured by stimulated recall data) predicted word recognition. The triangulation of the various measures allowed researchers to utilize a regression model to find predictors of vocabulary learning. This would not have been possible with eyetracking, or stimulated recall data on their own. An additional study that triangulated eye-tracking data with a verbalization measure was Révész and Gurzynski-Weiss (2016). These researchers investigated teachers’ perspectives on L2 task difficulty using concurrent think-alouds triangulated with eye-tracking data. Sixteen ESL teachers judged the linguistic ability required to carry our four different pedagogic tasks aloud while their eye movements were tracked. Results showed that teachers were primarily concerned with linguistic factors when assessing task difficulty. Eye movement data were aligned with think-aloud comments showing that these two measures can easily complement each other. Similarly, Smith (2012) combined eye-tracking data with stimulated recall interviews to explore the construct of noticing corrective feedback. Eighteen pairs of English learners participated in task-based SCMC with a native speaker who provided explicit recasts. The participants’ eye gaze was recorded and compared to the results from stimulated recall interviews. Results confirmed the strength of both measures for identifying what learners notice in SCMC corrective feedback.

Using Stimulated Recall as an Additional Source of Data We next turn to considering the results of one or two published studies and the issues that they raised, asking whether the use of stimulated recall could have contributed to our knowledge on this particular topic. By exploring the ways in which the use of stimulated recall data might enhance the findings of the studies, our intention was to further illustrate the nature of stimulated recall as well as demonstrate its versatility as a tool. Of course, there are many practical issues and challenges to be considered (as detailed in earlier chapters),

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

Using Stimulated Recall as an Additional Data Source

103

and although we briefly note some of these in relation to the specific studies we describe here, it is important to point out that we were not privy to the data collection situation. Further, because not all studies detail their procedures extensively, in some cases our discussion of how the use of stimulated recall may have benefited these studies may be tempered by other concerns of which we are not aware. It is, of course, the responsibility of individual researchers to carry out a careful cost-benefit analysis when deciding on methodologies. This section, then, details the potential of stimulated recall for triangulation of data and adding insights, and is contextualized within published research.

Interlanguage Phonology Change over Time Riney and Flege (1998) reported on a study of change over time in global foreign accent and liquid (/r/ and /l/) identifiability and accuracy. They carried out three experiments. Their goal was to study variation over time in global accent, intelligibility, and accuracy of phonetic segments and the relations among these constructs. They explored the link between global foreign accent and production of two English consonants, /r/ and /l/, by Japanese college students in their first year (Time 1) and their senior year (Time 2). Most of these learners had spent the majority of this time in Japan. Like Munro (1998), Riney and Flege used native English-speaking judges to rate sentences spoken by other native English speakers and Japanese L1 speakers (n = 11) as well as to rate 25 word onsets containing the target consonants. The native-speaking judges assessed how far the consonants produced could be identified as intended at Time 1 and Time 2, and whether the target consonants were produced more accurately at Time 1 or Time 2. For one of their experiments, they used a group of trained native-speaking judges (n = 3). The judges were categorized as trained because they had all carried out postdoctoral research in L2 phonetics and phonology. These trained judges were asked to identify the /r/ and /l/ consonants in the unidentifiable first part of words. Riney and Flege reported that some speakers showed significant improvement in both global foreign accent and liquid identifiability and accuracy. Perhaps unsurprisingly, the two speakers who improved the most were also the two speakers who had spent the most time in an English-speaking environment. The untrained judges used a scale from 1 “strong foreign accent” to 9 “no foreign accent” and were asked to rate only pronunciation and to ignore everything else. The trained judges were asked to categorize segments as /r/, /l/, or neither. In their conclusion, Riney and Flege noted that untrained listeners may have resorted to distinguishing sounds on the basis of software-induced noise distortion, that clusters may contain a host of phonotactic variables, and that phonological context may be important.

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

104

Using Stimulated Recall as an Additional Data Source

The trained judges, like the undergraduate linguistics students in Munro’s study, appear to have been well placed to comment retrospectively on their judgments. One of the trained listeners was one of the authors, so his insights were no doubt incorporated. Formally collected stimulated recall data about the judgments of the other trained judges with their post-doctoral work in L2 phonetics and phonology would no doubt have added valuable insights that could have refined future studies involving the perception of phonological variables.

Corrective Feedback and Phonology Saito (2013) reports a study that investigated the value of recasts in teaching the perception and production of English /r/ to Japanese learners. The study included 45 learners in three groups—form-focused instruction with recasts, formfocused instruction without recasts, and a control group. In the recast group, the instructor provided recasts in response to mispronunciations of English /r/. Saito measured perception with a two-alternative forced choice identification task and performance via controlled and spontaneous production tests assessed by ten native-speaking listeners. Results demonstrated improvements in both treatment groups, with the recast group showing an attentional shift away from lexical units as a whole to more phonetic aspects of speech. The addition of stimulated recall to this study would have added depth to the analysis of the results by further exploring questions such as, did the students notice the teacher’s corrections of /r/? Did the students intentionally shift their focus from lexical units to more general phonetic aspects of speech? Utilizing videos of classroom interaction to stimulate answers, this additional data might shed light on whether or not phonologic recasts seemed salient to the learners and if this leads to improvements in their perception and production abilities over the focus on form instruction.

Classroom Interaction Lyster’s (1998) study of recasts, repetition, and ambiguity in the L2 classroom context categorized 377 teacher recasts in 18 hours of classroom interaction. After classifying recasts according to pragmatic functions in classroom discourse, Lyster compared them to teachers’ use of noncorrective repetition. This led him to conclude that recasts and noncorrective repetition fulfill identical functions and that the corrective properties of recasts may be overridden by their functional properties in meaning-oriented classrooms. Lyster’s measure of corrective properties was learners’ immediate uptake of recasts, or what learners actually did with recasts in the turn immediately following the recast. Mackey and Philp (1998) showed that developmental outcomes of recasts may not show up immediately but may show up in the short term or the longer term, a developmental perspective

Using Stimulated Recall as an Additional Data Source

105

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

pointed out by others including Gass and Varonis (1994) and Lightbown (1998). Lyster argues that the corrective function of recasts could be overridden by their positive reinforcing function may be one explanation for why beginning-level learners did not develop as a result of intensive recasts, but advanced-level learners did develop in Mackey and Philp’s (1998) laboratory study. Lyster stated: one may well wonder, first, how L2 learners can distinguish the purpose of recasts from the purpose of noncorrective repetitions and, second, whether the teachers’ intention in recasting is indeed to correct form or if their intention has more to do with content, … in addition, it may be the case that teachers, when their intentions were indeed to correct, provided additional signals, which were not detected in the transcript, that distinguished some recasts from noncorrective repetitions (e.g., waiting longer or looking at students in ways that invited uptake). (Lyster, 1998, p. 65–67) These statements clearly indicate that an interesting element could have been added to these classroom data. If the classes had been video recorded and stimulated recall procedures carried out with the teachers, their intentions in using recasts or non-corrective repetition could have been clarified. Also, if the stimulated recall procedure had been carried out with some students, an additional measure to the short-term, immediate uptake score would have shed light on the learners’ perceptions about the teachers’ recasts.

Oral Production Mehnert (1998) carried out a study of the effect of different amounts of planning time on the speech performance of different L1 learners of German. She used two tasks that varied in terms of structure and information familiarity. Both required participants to leave messages on answering machines and were designed so as not to appear too artificial in a language laboratory experimental setting. In the first task, participants were required to explain to a friend that they were not able to meet him at the airport and left directions along with a time to meet at the university. This task was supposed to be structured, to contain familiar information, and to require the present or future tenses. In the second task the participants were required to apologize and explain to two friends why they had not met them the day before as arranged. Six specified words were to be used. This task was supposed to be based on unfamiliar information, be unstructured, and require the use of the past tense. Arguably, the second task was more complex than the first task, according to Mehnert. Her study utilized three experimental groups with one, five, and ten minutes of planning time. Participants were asked to make written notes to ensure they planned; their oral production was measured in terms of fluency, accuracy, complexity, and lexical density.

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

106

Using Stimulated Recall as an Additional Data Source

Results showed that those with ten minutes of planning time produced more fluent and accurate speech with higher lexical density. Mehnert concluded that planning time results in improved performance, but that increase in performance differed according to the four measures she used, meaning that different things are going on at different times. For example, when participants had only one minute to plan they gave priority to accuracy, but when they had ten minutes to plan, they attempted to produce more complex language. She claimed that this result provides support for the competing attentional resources model, that one can focus on only accuracy or complexity at one time. She also claims that task properties at least partly determine the level of fluency, accuracy, and complexity produced. Mehnert began her discussion by saying, it was assumed that planning time is used by L2 learners to prepare cognitively and linguistically; that is, to decide on what meaning they want to convey and to search for and activate the linguistic resources best suited to express the intended meaning. (Mehnert, 1998, p. 99) She ended by saying that “how to approach the planning task was very much left to the subjects themselves” (p. 106) and calls for research on “how subjects’ attention can be channeled intentionally into improving either fluency, accuracy, or complexity of speech, or all of them equally” (p. 106). It would appear that research on planning, and this study in particular, could benefit from the use of stimulated recall methodology. First, carrying out a stimulated recall across different participants for different tasks with different amounts of planning time would result in useful data about what people thought they did with planning time. For example, the assumption Mehnert makes about preparing for linguistic and cognitive task demands could be explored empirically through introspections. Second, how L2 learners approach planning could be studied through stimulated recall by allowing learners the freedom to plan on paper, but then showing them videotapes of their planning time and asking them about their thought processes while they were planning. This exploration of different learners’ approaches would also provide insights into the ways L2 learners’ attention could be channeled into using planning time for linguistic improvement. Mehnert’s use of different amounts of planning time could also be incorporated into stimulated recall, examining intraparticipant variation in terms of perceptions about use, as well as interparticipant variation.

Interlanguage Pragmatics Bardovi-Harlig and Dörnyei (1998) explored the extent to which instructed L2 learners of English (ESL and EFL) were aware of the differences in learners’

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

Using Stimulated Recall as an Additional Data Source

107

productions and target language productions in grammar, in terms of the accuracy of utterances, and pragmatics, in terms of the appropriateness of utterances. They used a videotape with 20 scenarios and played the scenarios to 543 learners in three countries. They also considered the learners’ teachers in terms of their perceptions about grammatical and pragmatic errors. Eight scenarios featured pragmatically appropriate but ungrammatical sentences, eight scenarios featured grammatically appropriate but pragmatically inappropriate sentences, and four featured sentences that were pragmatically inappropriate and ungrammatical. Bardovi-Harlig and Dörnyei pointed out that grammatical competence often exceeds pragmatic competence and indeed found that EFL learners consistently ranked grammatical errors more seriously than pragmatic errors. The ESL learners showed the opposite pattern, although BardoviHarlig and Dörnyei found the ESL learner results more difficult to interpret given research findings that ESL learners exhibit different interlanguage systems than NSs. They suggested (following Schmidt, 1993) that noticing pragmatic aspects in the input is more likely in situations in which learners are struggling to make themselves understood as well as struggling to establish smooth relationships with NSs and other learners. They also pointed out that: “clearly any account of the development of interlanguage pragmatics will have to take into consideration the numerous variables that intervene between the stages of noticing and targetlike production” (p. 255). They noted that in their future research they intend to administer a production questionnaire and supplement the data with respondents’ retrospective comments. Stimulating the retrospective comments with judgment data obtained from participants while watching the video scenarios might allow studies such as this one to tap into the learners’ perceptions about grammar and pragmatics, as well as to explore their noticing of these variables. Of course, the act of recall could stimulate noticing, so controlled groups of recall and nonrecall participants would be needed. This could be controlled by having some learners, but not others, carry out production exercises.

Input-Based Pragmatic Instruction Takimoto (2007) investigated interlanguage pragmatics by evaluating the effectiveness of different types of input-based instructional approaches for teaching English request forms. The participants in the study (60 Japanese learners of English) were assigned to one of three treatment groups or a control group. The treatment groups varied in the type of input-based instruction used for teaching downgraders in polite request forms. One group received structured input tasks with explicit information, another group completed problem-solving tasks without explicit information, and the third group completed structured input tasks without explicit information. The control group completed simple reading comprehension exercises. Pre-, post-, and delayed posttests consisted of

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

108

Using Stimulated Recall as an Additional Data Source

a battery of tests including: a discourse completion test, a listening test, a roleplay test, and an acceptability judgment test. Results demonstrated that all three treatment groups outperformed the control groups. However, the structured input task with explicit information group did not maintain those effects on a delayed posttest. While the relative effects of the three types of tasks in this study were thoroughly examined, in the end, the question remains as to why some treatments worked better than others. Ultimately, the study questions the role of awareness and attention in L2 learning. While the pre-, post-, and delayed posttests provided a large amount of quantitative data, this valuable information could have been supplemented with stimulated recall data on the learners’ perceptions and experiences with the tasks. The study did employ an exit questionnaire which posed questions such as “Did you find the lesson difficult to follow? and “Did you understand clearly how to make polite requests?” with answers recorded on a Likert scale. The addition of stimulated recall to probe the learners’ thoughts using video from the tasks to stimulate the responses might have shed light on what specifically the students found difficult or challenging and when they were aware or unaware of the pragmatic input. This valuable information could not be obtained from an exit questionnaire alone.

Comprehension Kempe and MacWhinney (1998) carried out a study of acquisition of the learners’ ability in the L2 to comprehend overt morphological case marking in Russian or German as L2s by native English speakers. They contrasted two approaches to learning inflectional morphology: the rule-based approach, which predicts that learning is driven by paradigm complexity, and the associative approach, which predicts that learning is driven by the cue validity of individual inflections. Participants (n = 44, 22 learners of Russian and 22 learners of German) carried out different tasks targeting L2 comprehension. One was a picture choice task consisting of simple active transitive noun–verb–noun sentences that were grammatically correct with cues of case marking, noun configuration, and animacy counterbalanced. A second task required learners to make lexical decisions for words and nonwords. Participants were tested individually. For the lexical decision task they pressed one button if they knew the word and another button if they did not. For the picture choice task, they saw pairs of pictures while the name (descriptive title) of one of the pictures was presented through headphones. Participants were instructed to press the right button or left button depending on whether the name corresponded to the right or left picture. In the main experiment, learners heard simple transitive sentences accompanied by pictures of both nouns, and they chose the agent of the sentence from the two pictures, again pressing a button for the left or right pictures.

Using Stimulated Recall as an Additional Data Source

109

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

The results showed that learners of Russian used case marking earlier than learners of German, and learners of German relied more on animacy to supplement the weaker case marking cue. The competition model correctly simulated the results, supporting the claim that adult L2 learning is associative and driven by the validity of cues in the input. They concluded that [i]t can be argued that the language difference in the use of case markers does not necessarily arise from differences in the statistical distribution of case marking in the two languages but may be related to differences in metalinguistic awareness of case marking … More generally speaking, the perceived difference between the L1 and the L2 is smaller for the learners of German, which might encourage transfer and minimize these learners’ awareness of the morphological function of the determiner. (Kempe and MacWhinney, 1998, p. 580) Stimulated recall might have been profitably used to address the question about metalinguistic awareness. Participants in this study were recruited by advertisement and were college students or recent graduates. Hence, they were sufficiently sophisticated to be able to make appropriate metalinguistic comments. Learners could have been video recorded while carrying out the task, or their task and answers could have been used as the stimulus, given that the video recording would have to include the aural sentences, which might be more complex than simply showing people the task and their responses. Stimulated recalls, carried out in the L1, English, may have shed light on the questions raised by Kempe and MacWhinney about the perceived difference between the L1 and the L2. Learners could have been asked to introspect about these perceptions, which would have provided an additional source of data. The level of learner metalinguistic awareness about their L2s could also have been explored by carrying out stimulated recalls. Of course, stimulated recall protocols need to be used with caution in terms of perceptions, yet in this case they may have shed light on a complex question.

Input and Input Processing Reading Lee (1998) explored whether reading comprehension and input processing were affected by specific morphological characteristics in the input. Seventyone English L1 university students enrolled in a second-semester Spanish course read a passage in Spanish about a salesperson describing various features of computers. The passage contained 11 subjunctive verb forms, 9 of which were targeted for modification. A subjunctive group read the passage in its

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

110

Using Stimulated Recall as an Additional Data Source

original form with the targeted verbs in the subjunctive. An infinitive group read the passage with the targeted verbs in the infinitive form (no morphological encoding of person, gender, etc.). An invented group read the passage with the targeted verbs in an invented form (the verb stem plus –u). This form appeared to convey morphological encoding but in fact did not as it was artificially created. After reading the passage, the students completed a recall exercise in which they wrote down (in the L1) everything they could remember about the passage. Next, the students completed a word identification activity in which they reviewed a list of 100 verbs and nouns and placed a mark next to each word that had appeared in the story. Lee’s results indicate that recall by the subjunctive group was lower than that of the infinitive or invented group (there was no significant difference between the infinitive and invented groups). Additionally, across all groups, general recall for students who mentioned that the passage involved a salesperson was higher than for students who did not mention the salesperson. For the word identification activity, there were no significant differences among the groups. Finally, there was no correlation between performance on the recall activity and the word identification activity. Lee concluded that learners comprehend better when morphological forms in the input are less complex and that learners may detect forms in the input even when they are reading for meaning. He also suggested that the lack of correlation between comprehension and word identification indicates that learners comprehend content and process forms via different processes. He attributed the finding that learners who mentioned the salesperson had higher recall than those who did not to rhetorical organization. Additional exploration through the use of introspective methods (thinkaloud protocols or stimulated recall interviews) might have revealed more about the learners’ processing. By commenting on their thought processes while reading the passage, the learners could have provided insights into the processes they used for comprehension. In addition, the learners’ comments may have revealed more information about the input processing of the verbal morphology. It would have been interesting to know how learners reacted to the different presentation of verbs, particularly the invented forms.

Amount and Type of Exposure Leow (1998) compared the effects of different amounts and types of exposure to irregular third-person singular and plural preterite stem changing on –ir verbs in Spanish. Four groups of 88 English L1 university students enrolled in a firstsemester Spanish class (who had had only 7.5 hours of instruction) participated in the study. The single exposure, teacher-centered group received an explanation of the irregular forms, followed by drill exercises. The multiple exposure, teachercentered group received the same instruction and received it again three weeks

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

Using Stimulated Recall as an Additional Data Source

111

later. The single exposure, learner-centered group completed a crossword puzzle designed to draw their attention to the irregular forms. The multiple-exposure, learner-centered group completed another crossword puzzle three weeks later. All the students in the learner-centered groups completed think-alouds while they were working on the crossword puzzles. Three posttests consisting of a multiple-choice recognition exercise and a fill-in-the-blank activity were conducted immediately after the treatment, three weeks later (immediately after second treatment for multiple exposure groups), and 11 weeks later. The results indicate that the multiple-exposure groups outperformed the single-exposure groups on the second and third posttests. Furthermore, the gain by the multiple-exposure groups at the first posttest was maintained over time. In contrast, the single-exposure groups demonstrated a significant decline from Posttest 1 to Posttest 3. With regard to the type of exposure, the learner-centered groups outperformed the teacher-centered groups on all posttests. A great deal of information about the learning processes of the students in the learner-centered groups was obtained through the think-aloud comments, but Leow’s study did not elicit information about the learning processes of students in the teacher-centered groups. His chosen methodological tool, thinkalouds, might not have been possible in the teacher-centered context; however, stimulated recalls could have been carried out. The teacher-centered sessions could have been videotaped and the tapes played back so that the students could introspect about their thoughts while the instruction was in progress, watching themselves on the videotape for the recall stimulus. It should be noted that carrying out stimulated recalls before all posttests were completed is clearly inadvisable as is a long gap between the original activity and recall. Thus, if such an approach were adopted, it might be best only to obtain recall data from a subset of learners in the teacher-centered group. These learners should then not participate in the posttests. Restricting the participant pool for tests may be why stimulated recall was not carried out in this study.

L2 Writing In order to investigate the effects of focused and unfocused written corrective feedback on the acquisition of the English indefinite and definite articles, Ellis, Sheen, Murakami, and Takashima (2008) studied 49 Japanese learners as they wrote narratives in three different treatment conditions. One group received focused correction of article errors only, one group received unfocused correction of article errors alongside corrections of other errors, and a control group received no corrections of linguistic errors. All students completed pre-, post-, and delayed posttests consisting of a narrative writing test and an error correction test. Finally, students completed an exit-questionnaire asking them to describe what they think they learned during the tasks.

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

112

Using Stimulated Recall as an Additional Data Source

The results from the pre- and posttests showed gains on both the error correction test and the narrative writing test for both the focused and unfocused treatment groups. Both groups also outperformed the control group on the posttests. The corrective feedback was shown to be equally effective for the focused and unfocused groups. The authors discuss the possibility of avoidance and other strategies as cause for some of the gains on the posttests. The strategy use and attention to articles cannot be gleaned from the questionnaire data alone. Stimulated recalls would have been effective in providing this information. Students could have been asked to reflect on their strategies of metalinguistic awareness, avoidance, and attention to linguistic forms as they wrote. Stimulated recall could have also been utilized when students received their corrected narratives and were reviewing them. Questions would probe whether or not students attended to the article corrections their teacher made. This additional data could have helped to resolve some of the lingering questions in this study of L2 writing.

L2 Reading Comprehension Barry and Lazarte (1998) explored domain-related knowledge, syntactic complexity, and reading topic in the context of how they affected inference generation in the written recalls of Spanish learners (with English L1s). Two groups of participants, with high knowledge and low knowledge, read three Spanish passages, each at a different level of syntactic complexity. Knowledge level was operationalized as students’ exposure to a specific content domain and experience with the targeted text types. Barry and Lazarte examined within-text inferences, elaborative inferences, and incorrect inferences. They counted the total number of inferences as providing evidence for the richness of the mental model and the type of inferences as providing information about the nature and accuracy of the mental model. They found that high-level readers generated a richer and more accurate mental model than did low-knowledge readers. A related finding was that the level of complexity and the reading topics indicated a complex pattern of influence on inference generation. On the basis of their findings, the authors suggested (among other conclusions) that high-knowledge readers in an L2 shift to a top-down or knowledge-driven process when the increased syntactic complexity requires them to maintain clauses in working memory and, simultaneously, to activate information from previous segments of the text. (Barry and Lazarte, 1998, p. 190) They pointed to the need for further investigations to determine the potential of the resource allocation idea for describing the behavior of L2 learners. They specifically asked what kinds of resources are required for the written recall test.

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

Using Stimulated Recall as an Additional Data Source

113

The immediate recall production test could be followed up by a stimulated recall interview, using the reading passage and the written recall of the passage as the stimuli. Learners could also be shown videotapes of themselves completing the written recall and could be asked to introspect about their thought processes while carrying out the task. Researchers could gain potentially valuable insights into the construction of the mental model as well as into any shift in resource allocation and the type of resources learners thought they needed for the written recall of the reading passage. Of course, learners might find it challenging to recall the processes they used in their written recall. Careful pilot testing and question construction would be essential.

Eye Tracking In a study by Winke, Gass, and Sydorenko (2013) eye-tracking methodology was employed to investigate caption-reading behavior by foreign language learners, specifically how the relationship between native language and target language affects this behavior. English-speaking learners of Arabic, Chinese, Russian, and Spanish in their fourth semester of language classes watched two videos dubbed and captioned in the target language. The two videos differed in content familiarity. Eye trackers captured how much time each student spent reading the provided captions on the videos. The results demonstrated significant differences in time reading the captions between the various languages. Arabic learners spent more time reading captions than learners of Spanish and Russian, while Chinese learners spent less time on captions in the unfamiliar content video than the familiar content video. Learners of Spanish, Russian, and Arabic spent comparable amounts of time reading in both the unfamiliar and familiar videos. Interestingly, Winke et al. utilized qualitative data from traditional interviews to supplement the eye-tracking data. Using theories of dual-processing and cognitive load, they posit that L2 differences such as script, vocabulary, L2 proficiency, and instructional methods constrain the benefits of captioning. However, substituting stimulated recall interviews for the traditional interviews would have added a layer of depth to the investigation that might have shed additional light on when and why learners focused on captioned text. Using video playback and quantitative data, the researchers could have asked about particular instances where students took more time with captions and delved into more detail on these instances.

Oral Interaction Dialogue Swain and Lapkin (1998) carried out a study of learner–learner dialogue, finding support for the theoretical position that dialogue can be both a means of

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

114

Using Stimulated Recall as an Additional Data Source

communication and a cognitive tool. They analyzed language-related episodes (LREs) of two Grade 8 French immersion (English L1) students who were working together to solve a jigsaw task. They defined LREs as any part of a dialogue where students talk about the language they are producing, question their language use, or correct themselves or others. They classified the LREs as lexis-based or form-based. The task involved the students working out a story line, writing it out, and resolving the linguistic problems that arose during the task through discussion. They used both their L1 and their L2 to resolve the problems. Swain and Lapkin claimed that the two students jointly develop the storyline, co-construct the knowledge they need to express the meaning they want and co-construct knowledge about language in the process. An example they used shows one student, Rick, finding out the French word for pillow, oreiller, from his partner, Kim, and double checking that he has the correct item by pointing to the picture and saying, “is that l’oreiller?” and writing it down (p. 332). On a posttest, Rick and Kim both got the word for pillow correct. Swain and Lapkin called for “future work to combine an analysis of students’ collaborative dialogues with follow-up interviews in order to derive a more finegrained understanding of the mental processes” (p. 333). They also questioned whether certain aspects of tasks are found by students to be appealing or unappealing, conducive or not conducive to learning. An excellent way to conduct such followup interviews would be to show the students videotapes of themselves carrying out the tasks and ask them to introspect about their own and their perceptions of their partners’ contributions to and understanding of the LREs. With such recall support, the actual processes could be tapped more closely in introspection. Students could then be explicitly asked their opinions about certain aspects of the tasks.

Negotiation Foster (1998) investigated negotiation of meaning in the L2 classroom. The study specifically considered modified interaction in group- and pair-work as a function of prior negotiation. Twenty-one ESL students from a variety of L1s participated in task-based activities. The dyads engaged in a grammar-based task, in which students had to compose questions to elicit particular answers, and a picture difference task. This latter constituted a required information exchange. The small groups engaged in a consensus activity, in which a problem was given and they had to reach consensus as to the solution, and a map activity. This latter was classified as a required information exchange. Analyses were carried out using the c-unit as a basis. C-units are independent utterances that contain meaning (either referential or pragmatic). The results of Foster’s study contradict previous studies in the literature in that the incidence of negotiation was low and, further, there was a low incidence of modified utterances as a result of negotiation. In discussing her results, Foster

Using Stimulated Recall as an Additional Data Source

115

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

noted, “We now need to explore why so many of the students in this study were disinclined to initiate or pursue negotiation for meaning” (p. 18). We argue that this is precisely the type of information that could be gained through a stimulated recall procedure. For example, Foster presented the following exchanges as examples of the lack of response to a signal of incomprehension:

Example 5.1 (From Foster, 1998, pp. 15–16) A: “the sports field, swimming pool and equipment may be used free of charge.” B: Free of charge? What is that? C: (laughs) Yes. A: sports day. A: There is this one, this one, and after to camping site near Oldfield. B: Oldfield? C: Anyway, the best think I think is er camping. In both excerpts in Example 5.1, there is a signal that comprehension is questionable. In the first case, it is direct: “What is that?” and in the second it is indirect: “Oldfield?” Using stimulated recall, these learners’ responses (and others) may be able to move researchers from the realm of speculation to the realm of greater certainty.

Corrective Feedback A study conducted by Yang and Lyster (2010) investigated the effects of different corrective feedback techniques on the acquisition of the English regular and irregular past tense. Seventy-two Chinese learners of English were assigned to one of three groups: a prompt group, a recast group, or a control group. All three groups participated in form-focused production activities that elicited the target past tense forms. Yang and Lyster operationalized recasts as “the teacher’s reformulation … of the students’ ill-formed utterances that contained past-tense errors” (p.243), while prompts were operationalized as any corrective feedback that withheld the target form and pushed learners to self-repair, such as: metalinguistic clues, repetitions, clarification requests, and elicitations. In the control group, the teacher provided feedback only on content. To examine the students’ improvement in past tense usage, pretests, immediate posttests, and delayed posttests were delivered in both oral and written formats. The researchers found that the prompt group improved the most, followed by the recast group, with the control group improving the least.

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

116

Using Stimulated Recall as an Additional Data Source

Yang and Lyster mention the use of stimulated recalls in their review of previous research investigating the effectiveness of recasts. They state that stimulated recall methodology can shed light on which components of recasts benefit linguistic development the most. This methodology would also shed light on this study, gleaning more information on why prompts seem to be more useful to learners than recasts or content-related feedback.

Syntactic Processing Hoover and Dwivedi (1998) conducted a study of syntactic processing by fluent L2 speakers. The participants were NSs of French (n = 48) and English speakers fluent in French (n = 51). Clitics in causative and noncausative sentences, as in Examples 5.2 and 5.3, were used to investigate syntactic processing.

Example 5.2 (From Hoover and Dwivedi, 1998) Clitic in non-causative: Il aimait tranquillement le goûter avec son fromage doux préferé. He loved to taste it quietly with his favorite mild cheese.

Example 5.3 (From Hoover and Dwivedi, 1998) Clitic in causative: Il le faisait tranquillement goûter avec son fromage doux préferé. He had it be tasted quietly with his favorite mild cheese. Participants read target sentence pairs. These sentences were embedded in 72 filler sentence pairs. Each pair had a context sentence and a target or filler sentence. The main task was conducted on-line. Participants read sentences word by word, pressing a space bar for each new word. Sentence pairs were followed by comprehension questions. Following the main experiment, participants were given a standardized reading comprehension test (Wisconsin College-Level Placement Test). The purpose of this test was to divide the L2 readers into two groups (i.e., high and low) according to their reading proficiency. Their specific hypothesis, which was confirmed, was that slow L2 readers would be less efficient in their processing of syntactic information. However, as with many empirical studies, questions remained and alternative explanations were suggested. One question had to do with whether or not the results could have been a factor of word recognition. Stimulated recall, with examples from the task, could shed light on this issue, for example by replaying

Using Stimulated Recall as an Additional Data Source

117

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

the video recordings of each participant asking questions relating to the length of time used to press the space bar for the next word (e.g., “I noticed that it took x seconds for you to press the space bar for the next word. What were you thinking about during that time?”).

Vocabulary Incidental Vocabulary Learning A study by Joe (1998) on incidental vocabulary learning is an interesting one for the exploration of stimulated recall (see studies by Fraser, 1999, and Paribakht & Wesche, 1999, for examples of verbal protocols in incidental vocabulary learning). Participants were 48 ESL learners of a variety of L1s. They were divided into three groups: experimental, comparison, and control. Prior knowledge was obtained, including knowledge of particular words related to the study, general word knowledge, and overall proficiency. On the basis of background knowledge, participants were divided into two groups (i.e., high and low), and these individuals were distributed evenly across the three experimental groups. A passage on the topic of pain was selected, and from that passage 12 words were targeted. In the experimental group, participants carried out a read-and-retell task after they had done a background-knowledge activation task in which questions based on the text were posed. Participants could not refer to the input text during the retelling phase. The experimental group was given specific instruction on learning strategies. In particular, they were given instruction on ways (a) to recall prior experience and knowledge to make sense of unfamiliar concepts or words in the text, that is, learners were told to add their own examples, experience, and knowledge to information from the text, and to offer personal opinions and comments; (b) to paraphrase, use synonyms, examples, or analogy; and (c) to discuss why some examples of learners’ generative strategies were better than others. (Joe, 1998, p. 367) In the comparison group, the procedure was identical to the experimental group’s procedure with two exceptions: They had access to the input text as they retold it, and they did not have any strategy instruction. The control group had no treatment; they continued with regular class instruction during the time of the experiment. They took the pre- and posttests only. The pretest comprised a self-report interview based on 28 words, including the 12 target words, and a read-and-retell task using think-aloud procedures. The posttest included the self-report interview, which focused only on the 12 target words, and two multiple-choice tests.

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

118

Using Stimulated Recall as an Additional Data Source

In general, the results suggest that incidental vocabulary learning does take place. The reading and recall tasks without a specific focus on the vocabulary items in question, yielded better results than those of the control group. The study also suggests that prior background knowledge affects vocabulary learning. This study is particularly interesting in that the comparison group had access to the text in the retell task. During retellings, they used think-aloud sorts of phrases such as, “I’m not sure, but I think …” It would be an easy step to turn this into a stimulated recall procedure. Gass (1999) argued that incidental vocabulary learning is often misnamed because researchers investigate the concept from a teacher perspective. That learning appears to be incidental does not mean that there hasn’t been a specific focus on the word in question. A stimulated recall procedure (after posttests) might reveal what learners were focusing on. For example, after a comparison of a given learner’s performance on the pretest and posttest and the observation that word x was learned, but word y was not learned, the retell (particularly the retell for the comparison group that included access to the original input text) might be used to probe what learners were thinking about as they encountered words x and y.

Acquisition of Words in an Unknown Language Lotto and de Groot (1998) investigated the acquisition of words in a new language (Italian) by Dutch speakers. The variables considered were learning method, word frequency, and cognate status. Fifty-six NSs of Dutch participated in their study. None knew any Italian, although some were familiar with other Romance languages. Participants were presented with 80 pictures representing 20 words in each of four categories: high-frequency cognates, high-frequency noncognates, low-frequency cognates, and low-frequency noncognates. There were two conditions. In the word-learning condition, participants were presented with a Dutch word and the Italian translation. In the picture-learning condition, a stimulus picture was presented with only the Italian word. All stimulus material was presented on a computer. The experimental session included a learning phase and a test phase. The 56 participants were divided into 4 groups of 14 each. In the learning phase, two of the groups were part of the word-learning group, and the others were in the picture-learning group. Material was presented three times for each stimulus. The test phase resulted in further divisions of the groups. One group in the word-learning condition and one group in the picture-learning condition were presented with material congruent with the learning phase, whereas the others received stimuli that were unlike those in the learning phase. In other words, half the participants in each condition received Dutch words as stimuli and were asked for the equivalent Italian word; the other half received pictures as stimuli and were asked for the appropriate Italian word. In a follow-up test phase,

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

Using Stimulated Recall as an Additional Data Source

119

participants did exactly the same task as in the first round of the experiment; they did the same learning task and the same testing task, both of which placed each participant in the same condition as before. In short, the word-learning condition tended to lead to better performance than the picture-learning condition (effects were also found for cognates and for high-frequency words). However, Van Hell and Candia Mahn (1997) presented conflicting results, which might have been affected by learning experience. An additional layer of data collection using a stimulated recall procedure might be useful. For example, participants could be shown a videotape of themselves performing the testing phase. Examples could be culled from each condition and from each category of word types with questions such as, “You seemed to have taken a long time to respond to this item. What were you thinking about at the time?” In this way, some insight may be gained from all areas of the study, including information from participants who had a different testing as opposed to learning phase and information about different word types (e.g., cognates vs. noncognates; high-frequency vs. low-frequency words).

Individual Differences In a study by Révész (2011), motivated by cognitive-interactionist perspectives on task-based language learning, the effect of task complexity and individual differences on form-meaning connections was examined. Forty-three ESL students in six different classes worked on two versions of the same argumentative decision-making task—a simple and a complex version. The complex task involved higher levels of reasoning and more elements than the simple task. The tasks were recorded and coded for complexity and accuracy of speech production and LREs. Questionnaires were then used to assess the students’ linguistic self-confidence, language use anxiety, and self-perceived communicative competence. The questionnaire in this study was also designed to obtain information about the students’ and teachers’ perspectives and experiences with the task. They were asked to rate using a Likert scale which version of the task they found more useful for L2 learning, more difficult, more interesting, more stressful, more effective in drawing their attention to the quality of their output, and more successful in directing their attention to the quality of their peers’ production. (Révész, 2011, p. 169) Results showed an increase in the learners’ accuracy and lexical diversity with a decrease in syntactic complexity in the more complex task. The more complex task also induced more advanced constructions and more LREs. However, no

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

120

Using Stimulated Recall as an Additional Data Source

effects of individual differences were observed. The question remained as to why there appeared to be no effects of individual differences. Stimulated recall, with examples from the task, could shed light on this issue. Instead of merely speculating as to whether or not the learners overcame individual differences by developing strategies, stimulated recall could have probed this issue further and perhaps explained the puzzling results found in this study.

Usage-Based Approaches Eskildsen (2012) investigated usage- and exemplar-based roots of L2 negation construction learning in a longitudinal case study of two adult L2 English learners. Usage-based linguistic theory views language knowledge as form-meaning pairings with language learning built upon memories of language use and frequency-based abstractions. Much of the usage-based literature such as Eskildsen’s study relies on corpus data of language learners. Naturally, corpora do not allow for stimulated recall once much time has passed after the corpus data were obtained, or if the students are not available. However, we can still examine the benefits of stimulated recall on this theory of language acquisition if we assume the participants could be called back for stimulated recall interviews after data collection. In Eskildsen’s case study, two Spanish-speaking students in their ESL classroom were recorded over the course of three years of study. The students’ use of English negations follow the predictions of usage-based models of language acquisition, moving from recurring expressions to more dynamic inventories of linguistic resources. Interesting excerpts are examined such as one student’s use of the negation form “don’t” immediately after his teacher used it to explain the meaning of “disagree”:

Example 5.4 (From Eskildsen, 2012, p. 359) Teacher: Student:

disagree is if you don’t believe it don’t believe it don’t like it

It is cases such as the one above where a stimulated recall using the video and audio recordings would have been particularly helpful in understanding the student’s use of the negation “don’t.” For example, it seems the student repeated his teacher’s utterance, “don’t believe it” and subsequently applied it to a novel context, “don’t like it.” A stimulated recall could further investigate the student’s usage of the new form of negation. Having said this, in the qualitative portion of the study, Eskildsen is working from a conversation analytic perspective, where cognition is conceptualized as socially shared and interactionally grounded, an epistemological assumption that might be less compatible with the assumptions underlying the use of stimulated recall. These are interesting questions for researchers to consider.

Using Stimulated Recall as an Additional Data Source

121

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

Functional Magnetic Resonance Imaging (fMRI) Methodology In recent years, L2 researchers have been working, often in concert with neurolinguists, to understand how L2s are stored, processed, and used in the brain. Researchers using fMRI measure changes in blood oxygenation, which are claimed to reflect changes in cognitive processing. In an example of a study using techniques in neuroscience, Krönke, Mueller, Friederici, and Obrig (2013) used blood-oxygen-level-dependent (BOLD) fMRI to investigate the effect of gestures on the implicit retrieval of new vocabulary. Over three days participants learned novel words for common objects and were trained to repeat those words while performing an iconic gesture, a meaningless grooming gesture, or no gesture. In the two conditions involving gesture, the gestures were either actively repeated or passively observed. After training, neural processing of the novel words was assessed behaviorally and using fMRI. The researchers found that behaviorally (free and cued recall of the words) there were no substantial differences between the different training conditions. However, fMRI scans showed differential networks implicitly retrieving the learned words, varying depending on the training procedure. Activelyperformed gestures yielded larger activation in the semantic network of the brain. It is ambiguous at the end of the study why between-group differences in processing were not accompanied by behavioral ones. Stimulated recall could be productively implemented in this situation. By showing participants recordings of their gesture–word combinations and asking them what they were thinking as they learned and retrieved the words, it is possible behavioral results would have patterned more closely with brain activity. Stimulated recall could provide the missing data to explain the discrepancy.

Socio-cultural Theories of Language Acquisition In order to investigate the effects of scaffolded feedback and recasts on L2 development, Rassaei (2014) studied 78 Persian EFL learners using Vygotsky’s concepts of scaffolding and assisted performance. Learners were assigned to either a control group or one of two experimental groups. Learners in the experimental groups received either scaffolded feedback or recasts as they completed task-based interactions. Learners in the control group received no feedback. A grammaticality judgment test and an oral production task were used as pre- and posttests. The target structure of this study was the production of English wh-question forms during a spot-the-difference task. Rassaei found that both groups receiving feedback outperformed the control group on the grammaticality judgment posttest, with scaffolded feedback resulting in larger gains than the recast group. However, on the oral production task, the scaffolded feedback

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

122

Using Stimulated Recall as an Additional Data Source

group outperformed both the recast and control group. The author concludes that the scaffolded feedback assisted in the development of both metalinguistic and productive knowledge, while recasts only contributed to the development of metalinguistic knowledge. This quantitative data could be bolstered with the addition of stimulated recall data. The researcher could play back the recordings of the spot-the-difference task to participants shortly after task completion and ask them what they were thinking about as they performed the task. These data would help explain why the scaffolding seemed to improve the students’ question formation more than recasts.

Conclusions In this chapter we have argued for the importance of data triangulation. We have presented examples of how stimulated recall could have been used in conjunction with other data types to strengthen arguments of what learner processes. We urge researchers to think carefully about whether the type of triangulation we have discussed will work for their particular research project given the many other considerations (e.g., logistic, time, financial) that must be taken into account when designing a study. As with any elicitation method, researchers should be cognizant of the benefits, strengths, and limitations of the method they are using. As Gass and Mackey (2007) state in their introduction, “[w]hen we read research reports in journals or books, we can easily be lulled into a false sense of security (p. 1).” The sense of security comes from the fact that many of the limitations of the study (including the limitations of the elicitation measure) are minimized or glossed over. Stimulated recall is no exception, and in the next chapter of the book, we point out some of the limitations of stimulated recall.

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

6 LIMITATIONS

All methodologies come with limitations which need to be acknowledged in terms of their potential impact on the data. In this chapter we focus on the criticisms that are directed at stimulated recall. We have organized these problem areas into sections that deal with issues of validity and reliability. Procedural problems with the use of the methodology (and recommendations for avoiding issues where possible) were already covered in the previous chapters (see Chapters 3 and 4). Following this discussion of limitations, we think about some of the ways in which stimulated recall can enhance empirical research, and in fact deal with some of the limitations observational research faces, particularly where questions remain unanswered.

Issues of Validity and Reliability Here we deal with the most fundamental problems that have to do with the validity and reliability of the methodology. In other words, do stimulated recall reports actually reflect the thought processes of participants? Are thought processes even relevant to understanding how L2s are learned? Do participants inadvertently make things up to please researchers? As we know, retrospective reports assume that the information that is reported is directly accessible and available for verbal reporting. This assumption is better justified when there is only a short amount of intervening time between the event and the recall, as we have pointed out several times throughout this book. Nonetheless, participants should be able to access some type of memory structures when instructed to report what they can remember about their thought processes during an event.

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

124

Limitations

In early work, Nisbett and Wilson (1977b) argued against the use of verbal reporting as a means of gaining access into cognitive processing, claiming that their participants provided inaccurate reasons for their thoughts. Nisbett and Wilson argued that “there may be little or no direct introspective access to higher order cognitive processes” (p. 231). In their view, when individuals attempt to report on their cognitive processes, “they do not do so on the basis of any true introspection. Instead, their reports are based on a priori, implicit causal theories, or judgments about the extent to which a particular stimulus is a plausible cause of a given response” (p. 231).1 In other words, the verbal reports, perhaps tainted by inaccurate memory, contain (unknowingly) fabricated mental events. Part of Nisbett and Wilson’s argument rests on the fact that conscious awareness can only relate to the products of mental processes; the processes themselves cannot be reached through introspection. White (1980), however, had a different argument claiming circularity in Nisbett and Wilson’s argumentation: “If we decide to use consciousness as the criterion for making the distinction [between process and product], then the product/ process viewpoint becomes true by circularity” (p. 106). Because Nisbett and Wilson’s work is frequently cited by those who advise against using verbal report data, Ericsson and Simon (1996: 26-27) spent considerable space arguing against their position. They presented two excerpts from Nisbett and Wilson (1977b): People often cannot report accurately on the effects of particular stimuli on higher order, inference-based responses. Indeed, sometimes they cannot report on the existence of critical stimuli, sometimes cannot report on the existence of their responses, and sometimes cannot even report that an inferential process of any kind has occurred. (Nisbett and Wilson,1977b, p. 233) When reporting on the effects of stimuli, people may not interrogate a memory of the cognitive processes that operated on the stimuli; instead, they may base their reports on implicit, a priori theories about the causal connection between stimulus and response. (Nisbett and Wilson,1977b, p. 233) With regard to the first, Ericsson and Simon pointed out that there was a lack of precision in Nisbett and Wilson’s statements, with words such as “often” and “sometimes” being frequently used. The second statement is more difficult and represents the crux of the matter. However, Nisbett and Wilson did not discriminate between studies that were conducted well and those that were not. If recall is to be accurate and if recall is to reflect processes rather than theories about processes, there have to be safeguards in the procedure itself.

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

Limitations

125

As we have pointed out in earlier chapters, thought processes must be accessible. There are at least two ways to make this more likely. First, and this is the basis of stimulated recall, is that it is important to have actual recall as opposed to a verbal report, and second is that it is important to have as little time lag as possible between the event to be recalled and the recall. In addition to their criticisms of Nisbett and Wilson’s perspective, Ericsson and Simon stress the need to make distinctions between giving reasons for thoughts or thought sequences and just reporting on those thoughts. They provide an example to illustrate this important difference: When people are asked to generate a word that begins with the letter A, most respond with the word apple. When asked for a retrospective report (e.g., “tell me what you were thinking”) people report that it “popped up” but cannot provide any intermediate steps or thoughts. However, when asked why, they respond with something like, “In grade school I learned A as in apple.” Thus, the kinds of responses differ depending on the question. Ericsson and Simon underscore the need to ensure that the contents being recalled are in accessible memory structures and are generated orally. This latter requirement ensures that there are no modality translations that might interfere with the recall process.

Reactivity and Veridicality Russo et al. (1989) also pointed out the need to ensure that what is being recalled is not an automated process precisely because little trace of such processes is left in accessible memory. They suggest two ways of determining validity of the methodology: reactivity and nonveridicality. Reactivity refers to instances where the primary process is altered as a result of verbalization (this is likely to occur in concurrent protocols). For example, if learners are given a posttest that raises their attention to a particular grammatical structure after the stimulated recall interview, the viewing of episodes during the stimulated recall may serve as additional rehearsal of the structure. Reactivity could also be caused by the verbalizations that occur during the stimulated recall interview in which the learner has extra opportunities to practice the structure. Nonveridicality, on the other hand, refers to lack of correspondence between a protocol and the underlying primary process. Examples of nonveridicality include errors of omission (leaving out a necessary structure, such as the past tense –ed ending in English) and errors of commission (using the wrong grammatical structure in a specific contexts, for example overgeneralizing the third person –s in English to first person contexts, e.g., “I walks”). The former are of little consequence when dealing with stimulated recalls, but the latter are serious and can invalidate the methodology because the protocol itself is taken as veridical. Reactivity is of less concern when doing stimulated recalls. Russo et al.’s data are not conclusive with regard to the issue of reactivity. In some (but not all) of

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

126

Limitations

their experimental results, there appears to be evidence of reactivity, although the occurrence of reactivity itself was not predictable. In other words, a task that might have been likely to promote reactivity (e.g., a task that involved recoding from a pictorial to an oral mode) did not, in fact, promote reactivity. With regard to veridicality, the issue is similarly complex. They suggested that cueing the original task (e.g., with the original stimulus or with the individual’s performance on the original task) is more likely to result in intrusions than noncued stimuli. However, their participants performed four tasks, only one of which, an anagram task, was a verbal task, and the anagram task was not among those in which intrusions were found to any significant degree. Although the jury is still out on these issues, the question of validity must always be foremost; stimulated recalls must be carried out with care and the data used and interpreted with caution. Several recent studies have empirically investigated the validity of verbalizations and stimulated recall procedures in terms of reactivity. Egi, (2007) found that learners’ interpretations of implicit corrective feedback (in the form of recasts) provided during recall interviews influenced subsequent L2 development. Forty-nine learners either provided immediate reports of their interpretations of recasts during a task, or stimulated recall interview data. Both interviews were conducted immediately after the treatment task. While the two groups did not differ significantly on a subsequent posttest, the stimulated recall group, who participated in an additional stimulated recall interview session, outperformed the immediate report group on a delayed posttest. Egi suggests this shows reactivity of the stimulated recall interviews when they precede posttests. In follow-up work, Egi (2008) investigated the same question, namely, whether stimulated recall is reactive when it precedes posttests, asking this time if reactivity is related to recall stimuli, verbalizations, or both. Her study included 44 learners of Japanese divided into four groups: a stimulated recall group, a stimulus only group, an experimental control, and a test control group. All participants completed a pre- and posttest, and all but the test control group also participated in a communicative activity. Immediately following the activities, the stimulated recall groups viewed videos of the task and recalled their thoughts, the stimulus groups viewed the videotapes of themselves completing the task but did not verbalize their thoughts, and, finally, the experimental control neither watched the tapes nor verbalized thoughts. Results demonstrated non-significant differences between the groups, interpreted as suggesting non-reactivity of the stimulated recall interviews. In terms of concurrent protocols, Leow and Morgan-Short (2004) examined the reactivity of thinking-alouds during task completion. Two groups, a thinkaloud group (n = 38) and a no think-aloud (i.e., silent) group (n = 39) completed the same reading task. In the think-aloud group, the learners verbalized their thought processes aloud as they completed the task. The other group was only instructed to complete the task. The results showed no difference between

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

Limitations

127

the two groups on comprehension and structure recognition and production posttests demonstrating the non-reactivity of concurrent verbalization. Leow, Grey, Marijuan, & Moorman (2014) follows up by offering a concise summary of the advantages and drawbacks of concurrent data elicitation (eye-tracking, think-alouds, and reaction-time measurements) providing suggestions for their usage in investigating cognitive processes in L2 learning. Contrasting results were found, however, in Sanz et al. (2009) who explored the issue of reactivity in two studies including verbalizations. In the first experiment, 24 English L1 speakers received a computerized treatment that delivered a grammar lesson in Latin (the learners did not have any previous knowledge of Latin). Half of the learners were asked to think aloud as they completed the grammar lesson. Results suggested that the verbalizations did not induce reactivity in terms of accuracy but instead slowed posttest performance (in terms of reaction time). In a second experiment 24 learners completed tasks and received feedback but did not receive a specific grammar lesson in Latin. Again, learners were either assigned to a think-aloud or a silent group. Results on the second experiment suggested that thinking aloud had a facilitating effect on posttest scores (for both latency and accuracy). The authors urged SLA researchers to exercise caution when interpreting the results of studies that utilize verbal protocols arguing that both the nature of the assessment toll and the dependent variables may affect reactivity. This result was partially corroborated in a study by Godfroid and Spino (2015) that compared eye-tracking, thinking-aloud and a silent control condition for their various reactivity or nonreactivity effects. Advanced learners of English read short texts containing pseudowords in the three conditions: with eye-tracking (n = 28), thinking-aloud (n = 28) or as a silent control (n = 46). Results suggested nonsignificant effects of eye-tracking or thinking aloud on subsequent text comprehension. However, in terms of pseudoword recognition, thinking-aloud had a small positive effect. The results of this study seem to show a small amount of reactivity for a concurrent protocol but not for the unobtrusive eye-tracking method. Russo et al. (1989) and others do not claim their research has invalidated verbalization procedures. Instead, they argue that the instructions given to participants are important in minimizing any threat to the methodology itself. They provide a hierarchy of invalidities from most serious to least serious: disruption of the primary process, omissions in the verbal report, and longer time to complete the task (see Figure 6.1). These issues must all be kept in mind when conducting stimulated recall studies. We also need to bear in mind the kind of data that are being elicited. Much of the experimental work validating the methodology has dealt with psychological processing rather than with language phenomena. However, some work that has dealt with L2 data has thoughtfully utilized stimulated recall procedures with

128

Limitations

Most serious Disruption of the primary process

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

Omissions in the verbal report

Longer time to complete the task Least serious

FIGURE 6.1

Hierarchy of invalidities

an eye to potential limitations. Egi (2010), used stimulated recall to investigate learners’ perceptions of corrective feedback in the form of recasts. One day after participating in task-based interactions where they received recasts of their errors, 24 learners of Japanese were asked to comment on what they were thinking in situations when there was uptake (a reaction to the linguistic or semantic aspects of the recast) or modified output. Learner’s stimulated recall data were coded for the learners’ perceptions of the recasts such as a comment on the corrective feedback, “noticing the gap” between their own production and that of the NS model, and “other” if the learner did not recognize the feedback as corrective. It was found that learners perceived the recasts as corrective significantly more frequently in instances where there was uptake. In situations where learners reported that they both recognized the corrective feedback and noticed the gap in their interlanguage production, there were significantly more instances of repair and modified output. Egi recognized the potential limitations of the stimulated recall interview data and mitigated them in several ways. First, learners completed the stimulated recall interviews very shortly following the task (one day later), minimizing the effects of potential memory decay. Additionally, interviewers asked learners to report their thoughts at the time of the interaction, indicating that learners were most likely reporting an overall summary of their thoughts at different conversational turns, rather than on one specific turn. This allowed the researcher to determine only the instances where the learners noticed the feedback, rather than pushing them to assess specific episodes, potentially increasing validity. Egi also recognizes the limitations of the use of stimulated recall interviews and carefully contextualized the decision to use the strategy in previous literature and through careful operationalizations of the variables. With regard to reactivity, Anderson (1985) found a decrease in judgment accuracy as a function of verbalization, but Boritz (1986) noted improved performance after verbalization (motivation was hypothesized to be the intermediary factor). Biggs, Rosman, and Sergenian (1993) reported on concurrent verbal protocols, focusing on two issues, reactivity and completeness. Their study was conducted with equity analysts. As a way of testing reactivity and completeness, researchers had three conditions: verbal reporting, computer search, and both. In the verbal report

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

Limitations

129

condition, participants were given information about a company (e.g., financial and nonfinancial information) and were asked to think aloud when examining the data. In the computer condition, information was accessed by means of a computer program. In a condition that included both, participants were asked to think aloud while conducting a computer search. To determine reactivity, they compared the “computer” condition with the “both” condition and showed that accuracy of judgments was not affected. To test completeness, they considered the “both” condition and found that the verbal traces were less complete than the computer search. Even though the computer search condition resulted in greater reliability, it was not able to provide the insight into, in this case, decision-making behavior that can be provided through verbal reporting. Other research into issues of reliability was carried out by Ericsson and Simon (1980). They showed that verbalization affects cognitive processes “only if the instructions require verbalization of information that would not otherwise be attended to” (p. 215). This suggests, as we have suggested throughout the book, that one has to use extreme caution in what and how questions are being asked. As Ericsson and Simon noted “…verbal reports, elicited with care and interpreted with full understanding of the circumstances under which they were obtained, are a valuable and thoroughly reliable sort of information about cognitive processes” (p. 247). Taken together then, these studies suggest that issues of validity and reliability, need to be carefully considered, planned for methodologically, and reported in the final write-up of the study. However, the a priori idea that stimulated recalls do not collect the data they claim to collect is outdated and not supported.

Historical Concerns with Non-Production Data The L2 literature on limitations has not been without its detractors, both implicit and explicit. As noted in Chapter 1, Selinker (1974) made an early claim that “the only observable data from meaningful performance situations we can establish as relevant…are…IL utterances produced by the learner” (p. 35). He added that this is not an anti-mentalist position, but only that “the analyst in the interlingual domain cannot rely on intuitive grammatical judgments since he will gain information about another system, the one the learner is struggling with, i.e. the TL” (p. 51, fn. 9). In Selinker’s view, the most important argument is: [t]hat predictions based upon them are not testable in ‘meaningful performance situations’…a reconstruction of the event upon the part of the learner would have to be made in a perceptual interlingual study. Such difficulties do not exist when predictions are related to the shape of utterances produced as the result of the learner attempting to express in the TL meanings which he may already have. (Selinker, 1974, p. 51, fn. 9)

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

130

Limitations

Selinker was undoubtedly referring to the vexed methodology involving acceptability judgments. Nonetheless, his comments reflect the impressions that the only valuable data are those produced in context. Retrospection or introspection data, according to this argument, are tainted because other systems (e.g., memory) may be interfering with linguistic knowledge. As discussed in Chapter 1, Corder (1973), argued that forced elicitation data were necessary. Elicitation procedures are used to find out something specific about the learner’s language. Constraints must be placed on the learner so that she or he is forced to make choices within a severely restricted area of phonological, lexical, or syntactic competence2. Seliger (1983) questioned “whether the language learner can be used as a linguist in order to describe the process of second language acquisition, or even to describe linguistic processing in a more general sense” (p. 183). He also questioned whether “the verbalizations of learners represent some form of internal reality” (p. 180). He defined introspection as “conscious verbalizations of what we think we know” (p. 183) and argued that in utilizing verbal report data, we are taking “learner pronouncements [as] evidence for ‘the inner workings of the learner’s mind’” (p. 185). In laying out some of the limitations of using verbal report data (in our specific case, stimulated recall data), we have also made it clear that there are ways that some of the limitations of stimulated recall can be minimized (also see our recommendations in Chapters 3 and 4). Cohen (1996, 1998) and Matsumoto (1993), within the context of the L2 literature, deal with the disadvantages as well as advantages of verbal report data. The advantages they list include the following: UÊ UÊ UÊ UÊ UÊ UÊ

reflect a theoretical framework; reveal the information attended to during task performance; reflect cognitive events; are reliable in that they correlate with behavior; are useful in strategy research; are useful in determining what prior knowledge is used in processing texts.

Among the limitations they discuss are: UÊ UÊ UÊ UÊ UÊ UÊ UÊ UÊ UÊ UÊ

unconsciousness of cognitive processes; complexity of cognitive processes; inaccurate reporting on the part of participants; inaccessibility of some information; confounding of introspection and retrospection; intrusive dependent on verbal skills of participants; for L2 research, the language of processing versus the language of reporting; veridicality; generating verbal reports may alter the nature of the process.

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

Limitations

131

These and other advantages and disadvantages have been discussed in previous chapters of this book and point to the fact that stimulated recalls must be carried out with care and, like any methodology, be carefully limited in scope and power. The question of reliability has received somewhat less direct attention in the literature, probably because of the difficulty in determining reliability with introspections. Lieberman (1979), in his review of introspection, while acknowledging the limitations, discussed empirical evidence that shows that introspections can predict future behavior. Pressley and Afflerbach (1995) were careful to point out that the extent to which we can trust verbal report data is, in part, dependent on the amount of interpretation as opposed to pure content of memory that is given. In other words, it is more accurate for the researcher to focus their analysis on the specific content details the learner reports rather than on the learner’s interpretations of what happened during the task. For example, in reporting stimulated recalls, it is best to avoid framing the recalls as “participants thought x and y” and more accurate to say “participants’ recalls were y and z” using the actual words of the participants as opposed to making inferences or judgments about their words. To summarize, then, there are two important potential limitations in the use of stimulated recall that need to be emphasized in relation to the questions it can be used to address. First, although every effort should be made by the researcher to ensure the recall is carried out as close as possible in time to the actual event, in some cases the memory structures being recalled may not always relate directly to the event that just occurred and this should be acknowledged as a limitation. For example, if a learner is asked to recall thoughts during a story retelling performed three days before, the learner may recall thoughts that surrounded the story telling, knowledge that was laid down earlier or later, but accessed in the same frame. Also, it needs to be acknowledged that participants may have experienced some interference during the period between the event to be recalled and the recall. Even if that time is only half an hour, some memory types can decay rapidly (Berman, Jonides, & Lewis, 2009; Portrat, Barrouillet, & Camos, 2008; Posner, 1992), and contamination of memory represents one of the most significant threats to researchers’ claims that stimulated recall data can uncover information about participants’ cognitive processes.

RECOMMENDATIONS t t

In order to mitigate the effects of reactivity, stimulated recall interviews should always be conducted before any posttests. Conduct stimulated recall interviews as close to the actual event as possible but consider potential memory decay no matter how much or how little time passes.

132

t

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

t

Limitations

Avoid framing the recalls as “participants thought x and y” when reporting results. Instead, say “participants’ recalls were y and z” utilizing raw data as opposed to making inferences or judgments about participants’ words. Consider potential limitations of stimulated recall methodology and carefully contextualize and operationalize variables to best suit the needs of the method.

Practical Considerations The discussion up to this point has been primarily based on theoretical concerns. One cannot ignore other more practical concerns. Many of these have been pointed out throughout the book and are under the researcher’s control (e.g., experimental design and setup). There are others, however, that stem from the participants themselves. For example, participants may be anxious about the recall session. They may want to sanitize the data, especially if they are familiar with the researcher or if they believe their response might place a teacher in a bad light. This may be the case in some cultures, but not others. In sum, personal rapport is an important consideration, as is the need to make sure that participants are comfortable and familiar with what is being asked of them.

RECOMMENDATIONS t t t t t

Ensure a positive rapport with participant. Carefully select the person who is conducting recall interview keeping in mind cultural norms. Make sure that the participant is familiar with what is being asked of them. Make sure that the participant is familiar with how the session will work. Do not share research goal with participant, other than in general terms (e.g., to understand how second languages are learned).

Conclusions We conclude this book by repeating the premise that stimulated recall data can provide valuable information about some of the complex processes involved in learning L2s. The data, however, like all data in L2 research, must be carefully elicited and interpreted. In particular, data must always be interpreted within the framework of current theoretical concerns, and in conjunction with other compatible and reliable data. As we pointed out in the introduction, in the seventeen years since our original book was published, stimulated recall has

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

Limitations

133

become a standard methodological tool. In the first decade after the book was published, it was frequently cited by researchers using the tool, usually in lieu of an explanation as to what stimulated recall was. However, using stimulated recall as a data-elicitation tool has now become so standard that citations are not always considered necessary—it is an accepted part of the field, a methodological tool so common that it needs no further explanation or justification. What we have done here is to provide an update as to the many different ways it has been, and can continue to be, used which go beyond our original conceptualizations. We have presented ways of eliciting and interpreting stimulated recall data. We have also discussed pitfalls and areas of controversy. We believe that with informed use, this methodology can make an invaluable contribution to a careful and well-thought-out study. We have pointed to ways forward, where stimulated recall data are used to triangulate a range of data types and data sources on the cutting edge of L2 studies, including eye tracking, neuroimaging, and artificial languages. We imagine that the range of studies that incorporate stimulated recall data will increase as the field increases in sophistication.

Notes 1 Nisbett and Wilson (1977a) further this line of research empirically by showing that global evaluations of an individual can induce altered reports of a person’s attributes. 2 One of the most common means of gathering introspective data is through acceptability or judgment data. The validity and reliability of judgment data have been amply discussed in the literature and are not treated here (cf. Bard, Robertson & Sorace 1996; Birdsong, 1989; Cowan & Hatasa, 1994; Gass, 1994; Goss, Zhang, & Lantolf, 1994; Gutiérrez, 2013; Munnich, Flynn, & Martohardjono, 1994; Sorace, 1988; Vafaee, Suzuki, & Kachisnke, 2016), as judgment methodology is not directly related to the type of verbal reporting discussed in this book.

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

REFERENCES

Aichison, J. (1994). Words in the mind: An introduction to the mental lexicon (2nd edn). Oxford: Blackwell. Allison, P. (1987). What and how pre-service physical education teachers, observe during an early field experience. Research Quarterly for Exercise and Sport, 58, 242–249. Anderson, J. R. (1987). Methodologies for studying human knowledge. Behavioral and Brain Sciences, 10, 467–505. Anderson, M. (1985). Some evidence on the effect of verbalization on process: A methodological note. Journal of Accounting Research, 23, 843–852. Bailey, N., Madden, C., & Krashen, S. (1974). Is there a “natural sequence” in adult second language learning? Language Learning, 24, 235–243. Bard, E., Robertson, D., & Sorace, A. (1996). Magnitude estimation of linguistic acceptability. Language, 72, 32–68. Bardovi-Harlig, K., & Dörnyei, Z. (1998). Do language learners recognize pragmatic violations? Pragmatic versus grammatical awareness in instructed L2 learning. TESOL Quarterly, 32, 233–262. Barkaoui, K., Brooks, L., Swain, M., & Lapkin, S. (2013). Test-takers’ strategic behaviors in independent and integrated speaking tasks. Applied Lingusitics, 34, 304–324. Barrows, H. (2000). Stimulated Recall: Personalized assessment of clinical reasoning. Springfield, IL: Southern Illinois University School of Medicine. Barry, S., & Lazarte, A. (1998). Evidence for mental models. How do prior knowledge, syntactic complexity, and reading topic affect inference generation in a recall task for nonnative readers of Spanish? Modem Language Journal, 82, 176–193. Basturkmen, H., Loewen, S., & Ellis, R. (2004). Teachers’ stated beliefs about incidental focus on form and their classroom practices. Applied Linguistics, 25, 243–272. Benoit, W. (1995). Accounts, excuses, and apologies. A theory of image restoration strategies. Albany, NY: State University of New York Press. Berman, M., Jonides, J., Lewis, R. (2009). In search of decay in verbal short-term memory. Journal of Experimental Psychology: Learning, Memory, and Cognition, 35, 317–333.

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

References

135

Bettman, J., & Park, C. (1979). Implications of a constructive view of choice for analysis of protocol data: A coding scheme for elements of choice processes (Working Paper 75). Los Angeles, CA: University of California, Center for Marketing Studies. Biggs, S., Rosman, A., & Sergenian, G. (1993). Methodological issues in judgment and decision-making research: Concurrent verbal protocol validity and simultaneous traces of process. Journal of Behavioral Decision Making, 6, 187–206. Birdsong, D. (1989). Metalinguistic performance and interlinguistic competence. Berlin: Springer Verlag. Bloom, B. (1953). Thought-processes in lectures and discussions. Journal of General Education, 7, 160–169. Bloom, B. (1954). The thought processes of students in discussion. In S. J. French (ed.), Accent on teaching: Experiments in general education (pp. 23–46). New York: Harper. Bloomfield, L. (1914). Introduction to the study of language. New York: Henry Holt and Company. Bloomfield, L. (1933). Language. New York: Holt, Rinehart, & Winston. Blumenthal, A. (1970). Language and psychology. Historical aspects of psycholinguistics. New York: John Wiley. Borg, S. (2003). Teacher cognition in language teaching: A review of research on what language teachers think, know, believe, and do. Language Teaching, 36, 81–109. Boritz, J. E. (1986). The effect of research method on audit planning and review judgments. Journal of Accounting Research, 26, 335–348. Bosher, S. (1998). The composing processes of three Southeast Asian writers at the postsecondary level: An exploratory study. Journal of Second Language Writing, 7, 205–241. Bowles, M. (2010). The think-aloud controversy in second language research. New York: Routledge. Bruner, J., Goodnow, J., & Austin, G. (1956). A study of thinking. New York: John Wiley. Burgoyne, J., & Hodgson, V. (1983). Natural learning and managerial action: A phenomenological study in the field setting. Journal of Management Studies, 20, 387–399. Calderhead, J. (1981a). A psychological approach to research on teachers’ classroom decision making. British Educational Research Journal, 7, 51–57. Calderhead, J. (1981b). Stimulated recall: A method for research on teaching. British Journal of Educational Psychology, 51, 211–217. Canguilhem, G. (1989). The normal and the pathological. (C. Fawcett with R. Cohen, trans.). New York: Zone Books. Canguilhem, G. (1994). Études d’histoire et de philosophie des sciences [Studies in the history and philosophy of science] (7th ed.). Paris: Librarie Philosophique J. Vrin. Carroll, S., & Meisel, J. (1990). Universals and second language acquisition: Some comments on the state of current theory. Studies in Second Language Acquisition, 12, 201–208. Chi, M. (1997). Quantifying qualitative analyses of verbal data: A practical guide. Journal of the Learning Sciences, 6, 271–315. Chi, M., Feltovich, P., & Glaser, R. (1981). Categorization and representation of physics problems by experts and novices. Cognitive Science, 5, 121–152. Chomsky, N. (1957). Syntactic structures. Berlin: Mouton. Chomsky, N. (1959). Review of “Verbal behavior” by B. F. Skinner, Language, 35, 26–58. Cohen, A. (1987). Using verbal reports in research on language learning. In C. Færch & G. Kasper (eds.), Introspection in second language research (pp. 82–95). Clevedon: Multilingual Matters. Cohen, A. (1996). Verbal reports as a source of insights into second language learner strategies. Applied Language Learning, 7, 5–24. Cohen, A. (1998). Strategies in learning and using a second language. London: Longman.

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

136

References

Cohen, A. (2007). The coming of age for research on test-taking strategies. In J. Fox, M. Weshe, D. Bayliss, L. Cheng, C. Turner, & C. Doe (Eds.), Language testing reconsidered (pp. 89–111). Ottowa: Ottowa University Press. Cohen, A., & Hosenfeld, C. (1981). Some uses of mentalistic data in second language research. Language Learning, 31, 285–313. Cook, V. (1990). Timed comprehension of binding in advanced L2 learners of English. Language Learning, 40, 557–599. Corder, S. P. (1973). The elicitation of interlanguage. In J. Svartvik (ed.), Errata: Papers in error analysis (pp. 36–48). Lund: CKW Geerup. Cowan, R., & Hatasa, Y. (1994). Investigating the validity and reliability of native speaker and second-language learner judgments about sentences. In E. Tarone, S. Gass, & A. Cohen (eds.), Research methodology in second-language acquisition (pp. 287–302). Mahwah, NJ: Lawrence Erlbaum Associates. Craik, F., & Lockhart, R. (1972). Levels of processing: A framework for memory research. Journal of Verbal Learning and Verbal Behavior, 11, 671–684. Daly, W. (2001) The development of an alternative method in the assessment of critical thinking as an outcome of nursing education. Journal of Advanced Nursing, 36, 120–130. de la Campa, J., & Nassaji, H. (2009). The amount, purpose, and reasons for using L2 in L2 classrooms. Foreign Language Annals, 42, 742–759. Dennett, D. (1987). The intentional stance. Cambridge, MA: MIT Press. Descartes, R. (1960 [1637]). Discourse on method. (L. La Fleur, trans.). New York: Bobbs Merrill. De Silva, R., & Graham, S. (2015). The effects of strategy instruction on writing strategy use for students of different proficiency levels. System, 53, 47–59. DiPardo, A. (1994). Stimulated recall in research on writing: An antidote to “I don’t know, it was fine.” In P. Smagorinsky (ed.), Speaking about writing: Relations on research methodology (pp. 163–181). Thousand Oaks, CA: Sage. Dörnyei, Z., & Kormos, J. (1998). Problem-solving mechanisms in L2 communication: A psycholinguistic perspective. Studies in Second Language Acquisition, 20, 349–386. Dulay, H., & Burt, M. (1974a). Natural sequences in child second language acquisition. Language Learning, 24, 37–53. Dulay, H., & Burt, M. (1974b). You can’t learn without goofing. In J. Richards (ed.), Error analysis: Perspectives on second language acquisition (pp. 95–123). London: Longman. Dulay, H., & Burt, M. (1975). Creative construction in second language learning and teaching. In M. Burt & H. Dulay (eds.), On TESOL ’75: New directions in second language learning (pp. 21–32). Washington, DC: TESOL. Egi, T. (2007). Recasts, learners’ interpretations, and L2 development. In A. Mackey (ed.), Conversational interaction in second language acquisition: A series of empirical studies (pp. 249– 267). Oxford: Oxford University Press. Egi, T. (2008). Investigating stimulated recall as a cognitive measure: Reactivity and verbal reports in SLA research methodology. Language Awareness, 17, 212–228. Egi, T. (2010). Uptake, modified output, and learner perceptions of recasts: Learner responses as language awareness. The Modern Language Journal, 94, 1–21. Ellis, R. (1990). Grammaticality judgments and learner variability. In H. Burmeister and P. Rounds (eds.), Variability in second language acquisition: Proceedings of the Tenth Meeting of the Second Language Research Forum (pp. 25–60). Eugene, OR: University of Oregon. Ellis, R. (1991). Grammaticality judgments and second language acquisition. Studies in Second Language Acquisition, 13, 161–186.

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

References

137

Ellis, R., Sheen, Y., Murakami, M., & Takashima, H. (2008). The effects of focused and unfocused written corrective feedback in an English as a foreign language context. System, 36, 353–371. Elstein, A., Shulman, L., & Sprafka, S. (1978). Medical problem solving: An analysis of clinical reasoning. Cambridge, MA: Harvard University Press. Erickson, F., & Mohatt, G. (1977). The social organization of participation structure in two classrooms of Indian students. Ottawa: Department of Indian Affairs and Northern Development. Ericsson, K., & Simon, H. (1980). Verbal reports as data. Psychological Review, 87, 215–251. Ericsson, K., & Simon, H. (1984). Protocol analysis. Cambridge, MA: MIT Press. Ericsson, K., & Simon, H. (1987). Verbal reports on thinking. In C. Færch & G. Kasper (eds.), Introspection in second language research (pp. 24–53). Clevedon: Multilingual Matters. Ericsson, K., & Simon, H. (1993). Protocol analysis: Verbal reports as data (2nd edn). Cambridge, MA: MIT Press. Ericsson, K., & Simon, H. (1996). Protocol analysis: Verbal reports as data (3rd edn). Cambridge, MA: MIT Press. Eskildsen, S. W. (2012). L2 negation constructions at work. Language Learning, 62, 335–372. Færch, C., & Kasper, G. (1987). From product to process: Introspective methods in second language research. In C. Færch & G. Kasper (eds.), Introspection in second language research (pp. 5–23). Clevedon: Multilingual Matters. Foster, P. (1998). A classroom perspective on the negotiation of meaning. Applied Linguistics, 19, 1–23. Fox, M. C., Ericsson, K. A., & Best, R. (2010). Do procedures for verbal reporting of thinking have to be reactive? A meta-analysis and recommendations for best reporting methods. Psychological Bulletin, 137, 316–344. Fox-Turnbull, W. (2009). Stimulated recall using autophotography: A method for investigating technology education. In A. Bekker, I. Mottier, & M. J. de Vries (eds.), Strengthening the position of technology education in the curriculum. Proceedings PATT-22 Conference Delft: International Technology & Engineering Educators Association. Fraser, C. (1999). Lexical processing strategy use and vocabulary learning through reading. Studies in Second Language Acquisition, 21, 225–241. Fujii, A., & Mackey, A. (2009). Interactional feedback in learner-learner interactions in a task-based EFL classroom. International Journal of Applied Linguistics, 47, 267–301. Garner, R. (1988). Verbal-report data on cognitive and metacognitive strategies. In C. E. Weinstein, E. T. Goetz, & P. A. Alexander (eds.), Learning and study strategies: Issues in assessment, instruction and evaluation (pp. 63–76). New York: Academic Press. Gass, S. (1979). Language transfer and universal grammatical relations. Language Learning, 29, 327–344. Gass, S. (1994). The reliability of second-language grammaticality judgments. In E. Tarone, S. Gass, & A. Cohen (eds.), Research methodology in second-language acquisition (pp. 303– 322). Hillsdale, NJ: Lawrence Erlbaum Associates. Gass, S. (1997). Input, interaction, and the second language learner. Mahwah, NJ: Lawrence Erlbaum Associates. Gass, S. (1998). Apples and oranges. Or, why apples are not orange and don’t need to be. Modem Language Journal, 82, 83–90. Gass, S. (1999). Discussion: Incidental vocabulary learning. Studies in Second Language Acquisition, 21, 319–333. Gass, S., & Mackey, A. (2000), Stimulated Recall Methodology in Second Language Research. New York: Routledge.

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

138

References

Gass, S., & Lewis, K. (2007). Perceptions of interactional feedback: Differences between heritage language learners and non-heritage language learners. In A. Mackey (ed.), Conversational interaction in second language acquisition: A series of empirical studies (pp. 173–196). Oxford: Oxford University Press. Gass, S., & Mackey, A. (2007). Data elicitation for second and foreign language research. Mahwah, NJ: Lawrence Erlbaum Associates. Gass, S., & Polio, C. (2014). Methodological influences of Interlanguage (1972): Data then and data now. In Z-H. Han & E. Tarone (eds.). Interlanguage 40 Years Later (pp. 147–171). Amsterdam: John Benjamins. Gass, S., & Varonis, E. (1994). Input, interaction, and second language production. Studies in Second Language Acquisition, 16, 283–302. Gass, S. with Behney J., & Plonsky, L. (2013). Second language acquisition: An introductory course. New York: Routledge. Gazzaniga, M. (1998). The split-brain revisited. Scientific American, 279, July, 51–55. Gilbert, W., Trudel, P., & Haughian, L. (1999). Interactive decision making factors considered by coaches of youth ice hockey during games. Journal of Teaching in Physical Education, 18, 290–311. Godfroid, A., & Schmidtke, J. (2013). What do eye movements tell us about awareness? A triangulation of eye-movement data, verbal reports, and vocabulary learning scores. In J. M. Bergsleithner, S. N. Frota, & J. K. Yoshioka (eds.), Noticing and second language acquisition: Studies in honor of Richard Schmidt (pp. 183–205). Honolulu, HI: University of Hawai’i, National Foreign Language Resource Center. Godfroid, A., & Spino, L. (2015). Reconceptualizing reactivity of think-alouds and eye tracking: Absence of evidence is not evidence of absence. Language Learning, 65, 896–928. Goldstein, B. (1999). Sensation and perception (5th edn). Pacific Grove, CA: Brooks/Cole. Goss, N., Zhang, Y-H., & Lantolf, J. (1994). Two heads may be better than one: Mental activity in second-language grammaticality judgments. In E. Tarone, S. Gass, & A. Cohen (eds.), Research methodology in second-language acquisition (pp. 263–286). Mahwah, NJ: Lawrence Erlbaum Associates. Gutiérrez, X. (2013). ‘The construct validity of grammaticality judgment tests as measures of implicit and explicit knowledge’. Studies in Second Language Acquisition, 35, 423–449. Habermas, J. (1979). Communication and the evolution of society. Boston, MA: Beacon Press. Hample, D. (1984). On the use of self-reports. Journal of the American Forensic Association, 20, 140–153. Hansebo, G., & Kihlgren, M. (2001) Carers’ reflections about their video-recorded interactions with patients suffering from severe dementia. Journal of Clinical Nursing, 10, 737–747. Hawkins, B. (1985). Is the appropriate response always so appropriate? In S. Gass & C. Madden (eds.), Input in second language acquisition (pp. 162–178). Rowley, MA: Newbury House. Henderson, L., & Tallman, J. (2006). Stimulated recall and mental models: Tools for teaching and learning computer information literacy. Lanham, MD: The Scarecrow Press. Henderson, L., Henderson, M., Grant, S., & Huang, H. (2010). What are users thinking in a virtual world lesson? Using stimulated recall interviews to report student cognition, and its triggers. Virtual Worlds Research, 3, 3–22. Hillway, T. (1969). Handbook of educational research: A guide to methods and materials. Boston, MA: Houghton Mifflin.

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

References

139

Hoover, M., & Dwivedi, V. (1998). Syntactic processing by skilled bilinguals. Language Learning, 48, 1–29. Huber, G., & Mandl, H. (Eds.). (1982). Verbale daten [Verbal data]. Weinheim: Beltz. Huitt, W. (2003). The information processing approach to cognition. Educational Psychology Interactive. Valdosta, GA: Valdosta State University. Retrieved 24 May 2016 from http://www.edpsycinteractive.org/topics/cognition/infoproc.html Isaacs, T., & Thomson, R. (2013). Rater experience, rating scale length, and judgments of L2 pronunciation: Revisiting research conventions. Language Assessment Quarterly, 10, 135–159. Jenkins, J., & Tuten, J. (1998). On possible parallels between perceiving and remembering events. In R. Hoffman, M. Sherrick, and J. Warm (eds.), Viewing psychology as a whole (pp. 291–314). Washington, DC: American Psychological Association. Joe, A. (1998). What effects do text-based tasks promoting generation have on incidental vocabulary acquisition? Applied Linguistics, 19, 357–377. Jourdenais, R. (1996). The limitations of think-alouds. Paper presented at The American Association for Applied Linguistics Conference, Chicago, IL, March. Kagan, N., Krathwohl, D., & Miller, R. (1963). Stimulated recall in therapy using video tape: a case study. Journal of Counseling Psychology, 10, 237–243. Kang, S. J. (2005). Dynamic emergence of situational willingness to communicate in a second language. System, 33, 277–292. Kasper, G., & Blum-Kulka, S. (1993). Interlanguage pragmatics. New York: Oxford University Press. Kellerman, E. (1979). Transfer and non-transfer: Where we are now. Studies in Second Language Acquisition, 2, 37–57. Kempe, V., & MacWhinney, B. (1998). The acquisition of case marking by adult learners of Russian and German. Studies in Second Language Acquisition, 20, 543–587. Khan, S., & Victori, M. (2011). Perceived vs. actual strategy use across three oral communication tasks. International Review of Applied Linguistics, 49, 27–53. Kressel, K., Henderson, T., Reich, W., & Cohen, C. (2012). Multidimensional analysis of conflict mediator style. Conflict Resolution Quarterly, 30, 135–171. Krönke, K.-M., Mueller, K., Friederici, A. D., & Obrig, H. (2013). Learning by doing? The effect of gestures on implicit retrieval of newly acquired words. Cortex, 49, 2553–2568. Labov, W. (1972). Sociolinguistic patterns. Philadelphia, PA: University of Pennsylvania Press. Lee, J. (1998). The relationship of verb morphology to second language reading comprehension and input processing. The Modern Language Journal, 82, 33–48. Leeman, J. (1999). Recasts in Spanish as a second language: An empirical study of negative evidence and enhanced salience. Presented at Second Language Research Forum, Minneapolis, MI, September. Lei, X. (2008). Exploring a sociocultural approach to writing strategy research: Mediated actions in writing activities. Journal of Second Language Writing, 17, 217–236. Lennon, P. (1989). Introspection and intentionality in advanced second-language acquisition. Language Learning, 39, 375–396. Leow, R. (1998). The effects of amount and type of exposure on adult learners’ L2 development in SLA. The Modern Language Journal, 82, 49–68. Leow, R. P., & Morgan-Short, K. (2004). To think aloud or not to think aloud: The issue of reactivity in SLA research methodology. Studies in Second Language Acquisition, 26, 35–57.

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

140

References

Leow, R. P., Grey, S., Marijuan, S., & Moorman, C. (2014). Concurrent data elicitation procedures, processes, and the early stages of L2 learning: A critical overview. Second Language Research, 30, 111–127. Levelt, W. (1989). Speaking: From intention to articulation. Cambridge, MA: MIT Press. Levelt, W. (1993). Language use in normal speakers and its disorders. In G. Blanken, J. Dittman, H. Grimm, J. C. Marshall, & C-W. Wallesch (eds.), Linguistic disorders and pathologies (pp. 1–15). Berlin: Mouton de Gruyter. Levelt, W. (1995). The ability to speak: From intentions to spoken words. European Review, 3, 13–23. Li, J., & Barnard, R. (2012). Academic tutors’ beliefs about and practices of giving feedback on students’ written assignments: A New Zealand case study. Assessing Writing, 16, 137–148. Lieberman, D. (1979). Behaviorism and the mind: A limited call for a return to introspection. American Psychologist, 34, 319–333. Lightbown, P. (1998). The importance of timing in focus on form. In C. Doughty & J. Williams (eds.), Focus on form in classroom second language acquisition (pp. 177–196). Cambridge: Cambridge University Press. Liimatainen, L., Poskiparta, M., Karhila, P., & Sjogren, A. (2001). The development of reflective learning in the context of health counselling and health promotion during nurse education. Journal of Advanced Nursing, 34, 648–658. Liimatainen, L., Poskiparta, M., Sjogren, A., Kettunen, T., & Karhila, P. (2001). Investigating student nurses’ constructions of health promotion in nursing education. Health Education Research, 16, 33–48. Lotto, L., & de Groot, A. (1998). Effects of learning method and word type on acquiring vocabulary in an unfamiliar language. Language Learning, 48, 31–69. Lyons, W. (1986). The disappearance of introspection. Cambridge, MA: MIT Press. Lyle, J. (2003). Stimulated recall: A report on its use in naturalistic research. British Educational Research Journal, 29, 861–878. Lyster, R. (1998). Recasts, repetition, and ambiguity in L2 classroom discourse. Studies in Second Language Acquisition, 20, 51–81. Ma, J. (2010). Chinese EFL learners’ decision-making while evaluating peers’ texts. International Journal of English Studies, 10, 99–120. Mackey, A. (2002). Beyond production: Learners’ perceptions about interactional processes. International Journal of Educational Research, 37, 379–394. Mackey, A., & Gass, S. M. (2015). Second language research: Methodology and design (2nd edn). New York: Routledge. Mackey, A., & Gum, A. (1997). Working into the workplace: Strategies for coping with transition. presented at the Annual Meeting of the Teachers of English to Speakers of Other Languages, Orlando, FL, March. Mackey, A., & Marsden, E. (eds.) (2016). Advancing methodology and practice: The IRIS repository of instruments for research into second languages. New York: Routledge. Mackey, A., & Philp, J. (1998). Conversational interaction and second language development: Recasts, responses, and red herrings. The Modern Language Journal, 82, 338–356. Mackey, A., Gass, S., & McDonough, K. (2000). How do learners perceive implicit negative feedback? Studies in Second Language Acquisition, 22, 471–497. Mackey, A., Kanganas, A., & Oliver, R. (2007). Task familiarity and interactional feedback in child ESL classrooms. TESOL Quarterly, 41, 285–312.

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

References

141

Mangubhai, F. (1992). Going beyond the product: How do we get at the processes of second language acquisition? Paper presented at the Pacific Second Language Research Forum, Sydney, July. Matsumoto, K. (1993). Verbal-report data and introspective methods in second language research: State of the art. RELC Journal: A Journal of Language Teaching and Research in Southeast Asia, 24, 32–60. Mehnert, U. (1998). The effects of different lengths of time for planning on second language performance. Studies in Second Language Acquisition, 20, 83–108. Miller, G., Galanter, E., & Pribram, K. (1960). Plans and the structure of behaviour. New York: Holt, Reinhart, and Winston. Munnich, E., Flynn, S., & Martohardjono, G. (1994). Elicited imitation and grammaticality judgment tasks: What they measure and how they relate to each other. In E. Tarone, S. Gass, & A. Cohen (eds.), Research methodology in second-language acquisition (pp. 227–43). Mahwah, NJ: Lawrence Erlbaum Associates. Munro, M. (1998). The effects of noise on the intelligibility of foreign-accented speech. Studies in Second Language Acquisition, 20, 139–154. Naylor, P. (n.d.). Legal testimony and the non-native speaker of English: The problem of linguistic and cultural interferences in interethnic communication (Unpublished manuscript). Newell, A., & Simon, H. (1956). The logic theory machine: A complex information processing system. I.R.E. Transactions on Information Theory, 2, 61–79. Nisbett, R., & Wilson, T. (1977a). The halo effect: Evidence for unconscious alteration of judgments. Journal of Personality and Social Psychology, 35, 250–256. Nisbett, R., & Wilson, T. (1977b). Telling more than we can know: Verbal reports on mental processes. Psychological Review, 84, 231–259. Nurmukhamedov, U., & Kim, S. H. (2010). “Would you perhaps consider…”: Hedged comments in ESL writing. ELT Journal, 64, 272–282. Paribakht, T. S., & Wesche, M. (1999). Reading and “incidental” L2 vocabulary acquisition: An introspective study of lexical inferencing. Studies in Second Language Acquisition, 21, 195–224. Pausawasdi, N. (2001). Students’ engagement and disengagement when learning with IMM in mass lectures (Unpublished doctoral dissertation). James Cook University, Australia. Peterson, P., & Clark, C. (1978). Teachers’ reports of their cognitive processes during teaching. American Educational Research Journal, 15, 555–565. Pinker, S. (1989). Resolving a learnability paradox in the acquisition of the verb lexicon. In M. L. Rice & R. L. Schiefelbusch (Eds.), The teachability of language (pp. 13–62). Baltimore, MD: P.H. Brookes. Plonsky, L. (2011). The effectiveness of second language strategy instruction: A metaanalysis. Language Learning, 61, 993–1038. Portrat, S., Barrouillet, P., & Camos, V. (2008). Time-related decay or interferencebased forgetting in working memory? Journal of Experimental Psychology: Learning, Memory, and Cognition, 34, 1561–1564. Posner, M. (1992). Attention as a cognitive and neural system. Current Directions in Psychological Science, 1, 11–14. Poulisse, N., Bongaerts, T., & Kellerman, E. (1987). The use of retrospective verbal reports in the analysis of compensatory strategies. In C. Færch & G. Kasper (eds.), Introspection in second language research (pp. 213–229). Clevedon: Multilingual Matters.

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

142

References

Pressley, M., & Afflerbach, P. (1995). Verbal protocols of reading: The nature of constrictively responsive reading. Hillsdale, NJ: Lawrence Erlbaum Associates. Rassaei, E. (2014). Scaffolded feedback, recasts, and L2 development: A sociocultural perspective. The Modern Language Journal, 98, 417–431. Reder, L. (1982). Plausibility judgments versus fact retrieval: Alternative strategies for sentence verification. Psychological Review, 89, 250–280. Révész, A. (2011). Task complexity, focus on L2 constructions, and individual differences: A classroom-based study. The Modern Language Journal, 95, 162–181. Révész, A., & Gurzynski-Weiss, L. (2016). Teachers’ perspectives on second language task difficulty: Insights from think-alouds and eye-tracking. The Annual Review of Applied Linguistics. Riney, T., & Flege, J. (1998). Changes over time in global foreign accent and liquid identifiability and accuracy. Studies in Second Language Acquisition, 20, 213–243. Rose, M. (1984). Writer’s block. Carbondale, IL: Southern Illinois University Press. Rumelhart, D., & McClelland, J. (1986). Parallel distributed processing: Explorations in the microstructure of cognition. Cambridge, MA: MIT Press. Russo, J., Johnson, E., & Stephens, D. (1989). The validity of verbal protocols. Memory and Cognition, 17, 759–769. Saito, K. (2013). The acquisitional value of recasts in instructed second language speech learning: Teaching the perception and production of English /‫݋‬/ to adult Japanese learners. Language Learning, 63, 499–529. Salvatori, P., Baptiste, S., & Ward, H. (2000) Development of a tool to measure clinical competence in occupational therapy: A pilot study? Canadian Journal of Occupational Therapy, 67, 51–60. Sanz, C., Lin, H.-J., Lado, B., Wood Bowden, H., & Stafford, C. A. (2009). Concurrent verbalizations, pedagogical conditions, and reactivity: Two CALL studies. Language Learning, 59, 33–71. Sato, M. (2007). Social relationships in conversational interaction: A comparison of learner and learner-NS dyads. JALT Journal, 29, 183–208. Schank, R., & Abelson, R. (1977). Scripts, plans, goals, and understanding. Hillsdale, NJ: Lawrence Erlbaum Associates. Schepens, A., Aelterman, A., & Van Keer, H. (2007). Studying learning processes of student teachers with stimulated recall interviews through changes in interactive cognitions. Teacher and Teacher Education, 23, 457–472. Schmidt, R. (1993). Awareness and second language acquisition. Annual Review of Applied Linguistics, 13, 206–226. Schmidt, R., & Frota, S. (1986). Developing basic conversational ability in a second language: A case study of an adult learner of Portuguese. In R. Day (ed.), Talking to learn: Conversation in second language acquisition (pp. 237–326). Rowley, MA: Newbury House. Schooler, J., & Engstier-Schooler, T. (1990). Verbal overshadowing of visual memories: Some things are better left unsaid. Cognitive Psychology, 22, 36–71. Schumann, J. (1978). The pidginization process. Rowley, MA: Newbury House. Schumann, J., & Schumann, F. (1977). Diary of a language learner: An introspective study of second language learning. In H. D. Brown, C. Yorio, & R. Crymes (eds.), On TESOL ’77 teaching and learning English as a second language: Trends in research and practice (pp. 241–249). Washington, DC: TESOL.

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

References

143

Seliger, H. (1983). The language learner as linguist: Of metaphors and realities. Applied Linguistics, 4, 179–191. Selinker, L. (1972). Interlanguage. International Review of Applied Linguistics, 10, 209–231. Selinker, L. (1974). Interlanguage. In J. Richards (ed.), Error analysis: Perspective on second language acquisition (pp. 31–54). London: Longman. Seung, L., & Schallert, D. (2004). Emotions and classroom talk: Toward a model of the role of affect in students’ experiences of classroom discussions. Journal of Educational Psychology, 96, 619–634. Shavelson, R., Webb, N., & Burstein, L. (1986). Measurement of teaching. In M. Wittrock (ed.), Handbook of research on teaching (pp. 50–91). New York: Macmillan. Shulman, L. (1986) Paradigms and research programs in the study of teaching: A contemporary perspective. In M. Wittrock, (ed.), Handbook of research on teaching (pp. 3–36). New York: Macmillan. Shulman, L., & Elstein, A. (1975). Studies of problem solving, judgment, and decision making: Implications for educational research. In F. N. Kerlinger (ed.), Review of research in education (vol. 3, pp. 3–42). Itasca, IL. Peacock. Siegel, L., Siegel, L., Capretta, P., Jones, R., & Berkowitz, H. (1963). Students’ thoughts during class: A criterion for educational research. Journal of Educational Psychology, 54, 45–51. Sjöholm, K. (1976). A comparison of the test results in grammar and vocabulary between Finnish and Swedish-speaking applicants for English. In H. Ringbom & R. Palmberg (eds.), Errors made by Finns and Swedish-speaking Finns in the learsning of English (AFTIL, vol. 5, pp. 54–137). Turku: Abö Akademi, Publications of the Department of English. Skinner, B. F. (1953). Science and human behavior. New York: The Free Press. Skinner, B. F. (1957). Verbal behavior. New York: Appleton-Century-Crofts. Smagorinsky, P. (1994) (Ed.). Speaking about writing: Reflections on research methodology. Thousand Oaks, CA: Sage. Smith, B. (2012). Eye tracking as a measure of noticing: A study of explicit recasts in SCMC, Language Learning and Technology, 16, 53–81. Sorace, A. (1988). Linguistic intuitions in interlanguage development: The problems of indeterminacy. In J. Pankhurst, M. Sharwood Smith, & P. Van Buren (eds.), Learnability and second languages. A book of readings (pp. 167–190). Dordrecht: Foris. Spenser, H. R., & Parikh, S. V. (2000) Continuing medical education. Canadian Journal of Psychiatry, 45, 297–298. Swain, M., & Lapkin, S. (1998). Interaction and second language learning: Two adolescent French immersion students working together. Modem Language Journal, 82, 320–337. Takimoto, M. (2007). The effects of input-based tasks on the development of learners’ pragmatic proficiency. Applied Linguistics, 30, 1–25. Titchener, E. (1908). Lectures on the elementary psychology of feeling and attention. New York: Macmillan. Tjeerdsma, B. (1997). A comparison of teacher and student perspectives of tasks and feedback. Journal of Teaching in Physical Education, 16, 388–400. Tode, T. (2012). Schematization and sentence processing by foreign language learners: A reading-time experiment and a stimulated-recall analysis, IRAL, 50, 161–187. Uggen, M. (2012). Reinvestigating the noticing function of output. Language Learning, 62, 506–540.

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

144

References

Uysal, H. (2008). Tracing the culture behind writing: Rhetorical patterns and bidirectional transfer in L1 and L2 essays of Turkish writers in relation context, Journal of Second Language Writing, 17, 183–207. Uysal, H. (2012). Cross-cultural pragmatics of reading: The case of American and Turkish students reacting to a Turkish text. The Reading Matrix, 12, 12–29. Uzum, B. (2010). An investigation of alignment in CMC from a sociocognitive perspective. CALICO Journal, 29, 135–155. Vafaee, P., Suzuki, Y., & Kachisnke, I. (2016). ‘Validating grammaticality judgment tests: Evidence from two new psycholinguistic measures’. Studies in Second Language Acquisition. Published online 11 March 2016. Van Hell, J., & Candia Mahn, A. (1997). Keyword mnemonics versus rote rehearsal: Learning concrete and abstract foreign words by experienced and inexperienced learners. Language Learning, 47, 507–546. van Someren, M., Barnard, Y., & Sandberg, J. (1994). The think aloud method: A practical guide to modelling cognitive processes. London: Academic Press. Watanabe, Y. (2008). Peer–peer interaction between L2 learners of different proficiency levels: Their interactions and reflections. Canadian Modern Language Review, 64, 605–635. Watanabe, Y., & Swain, M. (2007). Effects of proficiency differences and patterns of pair interaction on second language learning: Collaborative dialogue between adult ESL learners. Language Teaching Research, 11, 121–142. White, P. (1980). Limitations on verbal reports of internal events: A refutation of Nisbett and Wilson and of Bem. Psychological Review, 87, 105–112. White, L. (1989). Universal grammar and second language acquisition. Amsterdam: John Benjamin. Winke, P., Gass, S., & Sydorenko, T. (2010). The effects of captioning videos used for foreign language listening activities. Language Learning & Technology, 14, 65–86. Wittgenstein, L. (1958). Philosophical investigations (3rd edn). New York: Macmillan. Wundt, W. (1896 [1894]). Lectures on human and animal psychology (J. E. Creighton & E. B. Titchener, Trans.). New York: Swann Sonnenschein. Yang, Y., & Lyster, R. (2010). Effects of form-focused practice and feedback on Chinese EFL learners’ acquisition of regular and irregular past tense forms. Studies in Second Language Acquisition, 32, 235–263.ss Yoshida, R. (2008). Learners’ perceptions of corrective feedback in pair work. Foreign Language Annals, 41, 525–541. Zhao, H. (2010). Investigating learners’ use and understanding of peer and teacher feedback on writing: A comparative study in a Chinese English writing classroom. Assessing Writing, 15, 3–17. Ziegler, N. & Mackey, A. (2014). Pre-task planning, performance, and perceptions: A study of L2 text-chat. Paper presented at Second Language Research Forum, Columbia, SC. Zobl, H. (1980). Developmental and transfer errors: Their common bases and (possibly) differential effects on subsequent learning. TESOL Quarterly, 14, 469–479.

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

AUTHOR INDEX

Anderson, M. 128 Bard, E. 8 Bardovi-Harlig, K. 106–7 Barkaoui, K. 30, 43 Barnard, R. 43 Barrows, H. 25 Barry, S. 112–13 Basturkmen, H. 28 Biggs, S. 128–9 Bloom, B. 14, 15–16, 47 Bloomfield, L. 7–8 Bongaerts, T. 97 Borg, S. 28 Boritz, J. E. 128 Bosher, S. 36, 97–8 Bowles, M. 13, 14 Burstein, L. 10 Candia-Mahn, A. 119 Canguilhem, G. 4 Chi, M. 68, 96, 97, 98 Chomsky, N. 7 Clark, C. 16 Cohen, A. 10, 28, 46, 130 Corder, S. P. 2, 130 de Groot, A. 118–19 De La Campa, J. 38 Dennett, D. 5 Descartes, R. 3

Dörnyei, Z. 39–40, 106–7 Dwivedi, V. 116–17 Egi, T. 126, 128 Ellis, R. 28, 111–12 Elstein, A. 99 Ericsson, K. 8–9, 13, 14, 26, 45–6, 49–50, 53, 60, 120, 124–5, 129 Færch, C. 18 Feltovich, P. 97 Flege, J. 103–4 Foster, P. 114–15 Fraser, C. 117 Freud, S. 6 Garner, R. 47 Gass, S. 40, 42, 97, 113, 118, 122 Gazzaniga, M. 5 Glaser, R. 97 Godfroid, A. 102, 127 Gum, A. 77–8 Gurzynski-Weiss, L. 102 Habermas, J. 4 Hawkins, B. 32–3, 82, 85–7, 89, 90–3 Henderson, L. 23–4, 44–5, 46, 47 Hoover, M. 116–17 Huitt, W. 24–5 Isaacs, T. 38–9, 43

146

Author Index

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

Joe, A. 117–18 Johnson, E. 100 Kagan, N. 16 Kang, S. J. 31–2 Kasper, G. 18 Kellerman, E. 97 Kempe, V. 108–9 Khan, S. 34–5 Kormos, J. 39–40 Krathwohl, D. 16 Krönke, K.-M. 121 Labov, W. 59 Lapkin, S. 113–14 Lazarte, A. 112–13 Lee, J. 109–10 Leeman, J. 55–9 Lei, X. 43 Lennon, P. 27 Leow, R. 110–11, 126–7 Levelt, W. 39 Li, J. 43 Lieberman, D. 6, 131 Loewen, S. 28 Lotto, L. 118–19 Lyons, W. 3, 7 Lyster, R. 104–5, 115–16 McDonough, K. 80 Mackey, A. 33–4, 42, 48–9, 51, 53–5, 69, 77–8, 80–1, 82, 89, 97–8, 101–2, 104–5, 122 MacWhinney, B. 108–9 Matsumoto, K. 130 Mehnert, U. 105–6 Miller, R. 16 Munro, M. 42–3, 103, 104 Nassaji, H. 38 Naylor, P. 75–6 n1 Nisbett, R. 26, 124–5 Paribakht, T. S. 117 Peterson, P. 16 Philp, J. 104–5 Plonsky, L. 28 Poulisse, N. 97 Pressley, M. 131 Rassaei, E. 121–2 Reder, L. 27

Révész, A. 102, 119–20 Riney, T. 103–4 Robertson, D. 8 Rosman, A. 128–9 Russo, J. 100, 125–6, 127 Saito, K. 43, 104 Sanz, C. 127 Schmidtke, J. 102 Seliger, H. 130 Selinker, H. 2, 129–30 Sergenian, G. 128–9 Shavelson, R. 10, 16 Shulman, L. 8, 99 Siegel, L. 14 Simon, H. 8–9, 13, 14, 26, 45–6, 49–50, 53, 60, 124–5, 129 Skinner, B.F. 7, 20 n3 Smith, B. 102 Sorace, A. 8 Spino, L. 127 Sprafka, S. 99 Stephens, D. 100 Swain, M. 113–14 Sydorenko, T. 113 Tallman, J. 24, 44–5, 46, 47 Thomson, R. 38–9, 43 Tode, T. 29–30 Uggen, M. 43 Uysal, H. 35 Uzum, B. 36–7 Van Hell, J. 119 van Someren, M. 10 Victori, M. 34–5 Vygotsky, L. 121–2 Watson, J. B. 5–6, 7 Webb, N. 10 Wesche, M. 117 Wilson, T. 26, 124–5 Winke, P. 113 Wittgenstein, L. 4 Wundt, W. 6 Yang, Y. 115–16 Yoshida, R. 37–8 Ziegler, N. 101–2

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

SUBJECT INDEX

Bold page numbers indicate figures, italic numbers indicate tables. accents, variation over time 103–4 acceptability data 133 n2 acceptability judgments 4, 20 n4, 20 n5, 40, 97, 108 accuracy: and time lapse 46–7; and verbal reporting 13–14, 124–5 ambiguity 104–5 amount and type of exposure 110–11 anagram task 126 assisted performance 121–2 attention, research 37–8 awareness, research 37–8 behaviorism 7–8 C-units 114 case marking 5–7, 108–9 classification, of mental processes 4–5 classification scheme, introspective research 18–19, 18 classroom interaction 104–5 clitics 116 coding: and data layout 82–96; and depiction of data 99–100; final version of sheet 95; interrater reliability 77–82; interrater reliability schedule 88; preliminary version of sheet 94; preparing data 98–9; rater and task schedule

87–8; rater training 82–96; schemes 99; sheet for stimulated recall comments 85, 86–7; sheets 82; sheets for interaction episodes 86; specifications 82; training schedule 88; transcripts from Hawkins’ study 89, 90–3 cognitive load 113 cognitive processes 27 cognitive psychologies 7 collaborative dialogues 113–14 communication problems, research 39–40 composing process, research 36 comprehension 108–9 computer-mediated communication (CMC) 36–7 concurrent protocols 116 consciousness 6–7 consecutive recall 46 considerations, research design 51–2 context 49 corrective feedback 115–16, 126, 128 cross-cultural pragmatics 35 data analysis: analysis and description 99–100; coding see separate heading; combining methods 97; depiction of data 99–100; examples 78, 80; interrater reliability 77–82, 88; overview

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

148

Subject Index

77; preparing data for coding 98–9; process 96–9; qualitative analysis 96–7; quantitative analyses 96–7; rater objectivity 77–9; rater training 80, 82–96; researcher insight 79; sample training protocol 83–4; sampling recall data 98; statistical testing 99; steps in 77; summary and conclusions 100 data triangulation 101–2 debriefing 58–9 decision-making task 119–20 declarative knowledge 26 delayed recall 46 description of data 99–100 diagrammatic model of theoretical and methodological framework for stimulated recall 22–3, 24 dialogues, collaborative 113–14 diary studies 19, 96 domain-related knowledge 112–13 dual-processing 113 elicitation 44, 130 equipment 64 examples: data 62–3; data analysis 78, 80; negation 120; negotiation 115; research design 48, 50, 53–5, 56–8, 62–3, 69–70, 71, 72, 73–4; syntactic processing 116; training protocol 83–4 exposure, amount and type 110–11 eye-tracking 102, 113, 127 falsifiability 3–4 feedback 26–7; corrective 104, 115–16, 126, 128; perceptions about 33–4 focus of L2 research 25 forced elicitation data 2, 130 functional magnetic resonance imaging (fMRI) 121 gestures 121 grab bag game 32–3 grammaticality judgments 8, 20 n4, 20 n5 heeded information 8–9, 9 hierarchy of invalidities 128 iBT TOEFL 30–1 immediate retrospection 45–6 incidental vocabulary learning 117–18 individual differences 119–20 inflectional morphology, approaches to

learning 108–9 information-processing approach 14 input, and input processing 109–11 input-based instruction 107–8 insight, researchers’ 79 instructions 61 interactions, classroom 104–5 interlanguage phonology, change over time 103–4 interlanguage pragmatics 106–7 interpersonal process recall 16 interrater reliability 77–82; schedule 88; see also coding introspection: and behaviorism 5–7; definition 3; differences between types 16–17, 17; differentiating reports 12–13; difficulty of 59–60; methodology 6–7; types of 13; underlying assumptions 1 introspections, quantification of data 96 introspective methods 3–8; applications of 23, 25–8; classification scheme 18–19, 18 invalidities, hierarchy of 128 judgment data 133 n2 knowledge: determining 2–3; domainrelated 112–13; structures 26; types 26 L2 reading comprehension 112–13 L2 writing 111–12 language acquisition, socio-cultural theories 121–2 language, of recall 47–9 learners’ strategies 27–8, 30–1 learning, understanding 2 length of recall support 65–6 lexical retrieval 27 limitations: non-production data 129–31; overview 123; practical considerations 132; reactivity 125–9; recommendations 131–2; reliability 123–5, 129, 131; validity 123–5; verbal reporting 130–1; veridicality 125–9 magic trick analogy 5 maintenance rehearsal 24–5 meaning, negotiation of 114–15 mechanist psychology 7

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

Subject Index

mediation 25 memory: accuracy 124–5; filling in 46 memory retrieval 27 memory traces 24–5, 27 mental models 112 mental processes, classification of 4–5 mentalist psychology 7 metalinguistic awareness 109 methodology, verification of 40 mind-body dualism 3 motivation, research 31–2 narrative construction 5 negation 120 negotiation, of meaning 114–15 non-production data, limitations 129–31 nonrecent recall 46 nonveridicality 125–6 noticing, research 37–8 objectivity, of raters 77–9 oral interaction 80–1, 113–15 oral production 105–6 overview: empirical research 101; L2 research 22–4 participants: preparation 52–60; training 19, 60–1 perceptions about feedback 33–4 phonology 104 pilot testing 52 planning time, and performance 105–6 plans 26 practical considerations 132 pragmatics: input-based instruction 107–8; interlanguage 106–7 pragmatics and reading 35 preference 49 priming studies 60 problem of privacy 7 procedural knowledge 26 procedural pitfalls 64–75 process-tracing 10 processing, research 29–30 production data 2 proficiency 49 prompts and thoughts, types of 45 protocols: concurrent 116, 126; reporting 47; research design 52–9; retrospective 10; talk-aloud protocol 9; think-aloud protocol 9, 10–12; training 82, 83–4; verbal 68, 127,

149

128–9; veridicality 125 qualitative analysis 96–7 quantification 96 quantitative analyses 96–7 questions, ways of asking 49–50 range and use of stimulated recall 28 rater training 80, 82–96; see also coding reactivity 14, 125–9 reading 109-10; and pragmatics 35 reading topics, and comprehension 112–13 recall accuracy 15 recall support, length of 65–6 recasts 55–9, 104–5, 115–16, 126, 128 recency effect argument 46 receptive vocabulary learning 102 recommendations: interrater reliability 81; limitations 131–2; research design 51–2; timing 75 reflection 3–8 reliability 123–5, 129, 131; interrater 77–82; verbal reporting 13 repetition 104–5 replicability 4 reporting protocols 47 research design: allocating time for recall procedure 66–7; asking questions 49–50; collection scenario 53–5; considerations 51–2; debriefing 58–9; equipment 64; estimated and actual time 65–6; examples 48, 50, 53–5, 56–8, 62–3, 69–70, 71, 72, 73–4; instructions 53, 61; language of recall 47–9; length of recall support 65–6; participant training 60–1; preparation 52–60, 64; procedural pitfalls 64–75; protocols 52–9; recommendations 51–2; stimuli 44–5, 52; stimulus initiation 51–2; structure 51; summary and conclusions 75; time for setup and equipment 68; time for verbalization 68–75; time lapse 45–7; timing 52, 64–75; see also coding research protocols 52–8 research questions 42–3 research techniques 1 research studies 28–9, 29 researcher insight 79 researchers: importance of training 59; preparation 52–60; requirements 59

150

Subject Index

Downloaded by [University of California, San Diego] at 04:13 16 March 2017

resource allocation 43, 113 responses in oral interaction 32–3 retrieval cues 14, 46, 121 retrospective protocols 10 scaffolding 121–2 scripts 26–7 segmentation 99 self-observation 10 self-report 10 self-revelation 10 SLA studies using stimulated recall from 2000-2015 29 socio-cultural theories, of language acquisition 121–2 sociolinguistic interview technique 59 speech production, Levelt’s model 39 split-brain experiments 5 standardization, of instructions 53 stimulated recall: as additional data source 102–3; chapter overview 3; diagrammatic model 24; differentiation 22; overview 14–17; scope of term 1; sequence of events 15; strengths and limitations 19; types of 46 stimuli 44–5, 52 stimulus initiation 51–2 stimulus, strength 52 strategy use 27–8, 30–1, 34–5 structure 51 summary and conclusions: empirical research 121–2; L2 research 41 synchronous computer-mediated communication (SCMC) 101–2 syntactic complexity 112–13 syntactic processing 116–17 talk-aloud protocol 9 teacher cognition 28, 38 teacher training 16 tenses, understanding of 75–6 n1

testing, research studies 38–9 think-aloud protocol 9, 10–12; example 11–12 thinking aloud 126–7 time lapse 45–7 timing 52; allocating time for recall procedure 66–7; estimated and actual time 65–6; length of recall support 65–6; recommendations 75; in research 64–75; setup and equipment 68; time for verbalization 68–75 training, of raters 80, 82–96 training protocols 82; example 83–4 transfer 20 n2 triangulation 101–2; see also empirical research truth, tests of 3–4 understanding source of production, L2 research 23 usage-based approaches 120 validity 123–5; and participant training 60; verbal reporting 14 verbal overshadowing 21 n6 verbal protocols 68, 127, 128–9 verbal reporting 8–12; accuracy 124–5; advantages and disadvantages 13, 130–1; categories of data 10 verbalization, time for in research settings 68–75 veridicality 13–14, 125–9; protocols 125 verification, of methodology 40 vocabulary: empirical research 117–19; incidental learning 117–18; receptive learning 102; word acquisition 118–19 vocal pitch 43 willingness to communicate (WTC) 31–2 writing, L2 111–12