The Routledge Handbook of Experimental Linguistics [1 ed.] 9781003392972, 9781032492872, 9781032492896

The Routledge Handbook of Experimental Linguistics provides an up-to-date and accessible overview of various ways in whi

224 43 10MB

English Pages 522 [523] Year 2023

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

The Routledge Handbook of Experimental Linguistics [1 ed.]
 9781003392972, 9781032492872, 9781032492896

Table of contents :
Cover
Half Title
Series Page
Title Page
Copyright Page
Table of Contents
Acknowledgements
List of Contributors
Introduction: The Origins and Importance of Experimental Linguistics
Part I: Focus on Linguistic Domains
1 Historical Perspectives on the use of Experimental Methods in Linguistics
2 Experimental Phonetics and Phonology
3 Experimental Morphology
4 Experimental Syntax
5 Experimental Semantics
6 Experimental Pragmatics
7 Experimental Sociolinguistics
8 Experimental Studies in Discourse
9 Experimental Studies of Argumentation
10 Experimental Research in Cross-Linguistic Psycholinguistics
11 Experimental Methods to study Distributed Comprehension of Action Language
Part II: Focus on Experimental Methods
12 Eliciting Spontaneous Linguistic Productions
13 Analysing Speech Perception
14 Contrasting Online and Offline Measures: Examples from Experimental Research on Linguistic Relativity
15 Cognitive Processes Involved in text Comprehension: Walking the Fine line Between Passive and Strategic Validating Processes in Reading
16 Analysing Reading with Eye Tracking
17 Analysing Spoken Language Comprehension with Eye Tracking
18 Mobile Eye-Tracking for Multimodal Interaction Analysis
19 Analysing Language using Brain Imaging
20 New Directions in Statistical Analysis for Experimental Linguistics
21 Assessing Adult Linguistic Competence
22 Dealing with Participant Variability in Experimental Linguistics
23 Testing in the Lab and Testing through the Web
Part III: Focus on Specific Populations
24 Experimental Methods to Study Child Language
25 Experimental Methods to Study Atypical Language Development
26 Experimental Methods to Study Disorders of Language Production in Adults
27 Experimental Methods for Studying Second Language Learners
28 Experimental Methods to Study Bilinguals
29 Experimental Methods to Study Cultural Differences in Linguistics
30 Experimental Methods to Study Late-life Language Learning
Index

Citation preview

Offering impressive breadth and depth of coverage, and surveying classic and cutting-edge approaches and techniques, The Routledge Handbook of Experimental Linguistics uniquely positions itself as an indispensable resource for the novice and expert linguistics researcher alike. Essential reading for any methods course, too. Highly recommended. Panos Athanasopoulos, Lancaster University, UK The use of experimental methods is no longer something peripheral but lies at the core of language studies. Yet we lack coherent overviews of the large body of methods and knowledge from experiments on language. This book fills a gap and it is essential for the new generation of linguists. Valentina Bambini, University School for Advanced Studies IUSS Pavia, Italy

THE ROUTLEDGE HANDBOOK OF EXPERIMENTAL LINGUISTICS The Routledge Handbook of Experimental Linguistics provides an ­up-­​­­to-​­date and accessible overview of various ways in which experiments are used across all domains of linguistics and surveys the range of ­state-­​­­of-­​­­the-​­art methods that can be applied to analyse the language of populations with a wide range of linguistic profiles. Each chapter provides a s­ tep-­​­­by-​­step introduction to theoretical and methodological challenges and critically presents a wide range of studies in various domains of experimental linguistics. This handbook: • Provides a unified perspective on the data, methods and findings stemming from all experimental research in linguistics • Covers many different subfields of linguistics, including argumentation theory, discourse studies and typology • Provides an introduction to classical as well as new methods to conduct experiments such as eye tracking and brain imaging • Features a range of internationally renowned academics • Shows how experimental research can be used to study populations with various linguistic profiles, including young children, people with linguistic impairments, older adults, language learners and bilingual speakers Providing readers with a wealth of theoretical and practical information in order to guide them in designing methodologically sound linguistic experiments, this handbook is essential reading for scholars and students researching in all areas of linguistics. Sandrine Zufferey is a Professor of French linguistics at the University of Bern, Switzerland. Pascal Gygax is a Senior Lecturer at the University of Fribourg, Switzerland.

ROUTLEDGE HANDBOOKS IN LINGUISTICS Routledge Handbooks in Linguistics provide overviews of a whole subject area or sub-discipline in linguistics, and survey the state of the discipline including emerging and cutting-edge areas. Edited by leading scholars, these volumes include contributions from key academics from around the world and are essential reading for both advanced undergraduate and postgraduate students. THE ROUTLEDGE HANDBOOK OF NORTH AMERICAN LANGUAGES Edited by Daniel Siddiqi, Michael Barrie, Carrie Gillon, Jason D. Haugen and Éric Mathieu THE ROUTLEDGE HANDBOOK OF LANGUAGE AND SCIENCE Edited by David R. Gruber and Lynda Walsh THE ROUTLEDGE HANDBOOK OF LANGUAGE AND EMOTION Edited by Sonya E. Pritzker, Janina Fenigsen, and James M. Wilce THE ROUTLEDGE HANDBOOK OF LANGUAGE CONTACT Edited by Evangelia Adamou and Yaron Matras THE ROUTLEDGE HANDBOOK OF PIDGIN AND CREOLE LANGUAGES Edited by Umberto Ansaldo and Miriam Meyerhoff THE ROUTLEDGE HANDBOOK OF COGNITIVE LINGUISTICS Edited by Xu Wen and John R. Taylor THE ROUTLEDGE HANDBOOK OF THEORETICAL AND EXPERIMENTAL SIGN LANGUAGE RESEARCH Edited by Josep Quer, Roland Pfau, and Annika Herrmann THE ROUTLEDGE HANDBOOK OF LANGUAGE AND PERSUASION Edited by Jeanne Fahnestock and Randy Allen Harris THE ROUTLEDGE HANDBOOK OF SEMIOSIS AND THE BRAIN Edited by Adolfo M. García and Agustín Ibáñez THE ROUTLEDGE HANDBOOK OF LINGUISTIC PRESCRIPTIVISM Edited by Joan C. Beal, Morana Lukač and Robin Straaijer THE ROUTLEDGE HANDBOOK OF EXPERIMENTAL LINGUISTICS Edited by Sandrine Zufferey and Pascal Gygax

Further titles in this series can be found online at www.routledge.com/Routledge-Handbooks-inLinguistics/book-series/RHIL

THE ROUTLEDGE HANDBOOK OF EXPERIMENTAL LINGUISTICS

Edited by Sandrine Zufferey and Pascal Gygax

Designed cover image: © Getty Images | pressureUA First published 2024 by Routledge 4 Park Square, Milton Park, Abingdon, Oxon OX14 4RN and by Routledge 605 Third Avenue, New York, NY 10158 Routledge is an imprint of the Taylor & Francis Group, an informa business © 2024 selection and editorial matter, Sandrine Zufferey and Pascal Gygax; individual chapters, the contributors The right of Sandrine Zufferey and Pascal Gygax to be identified as the authors of the editorial material, and of the authors for their individual chapters, has been asserted in accordance with sections 77 and 78 of the Copyright, Designs and Patents Act 1988. All rights reserved. No part of this book may be reprinted or reproduced or utilised in any form or by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying and recording, or in any information storage or retrieval system, without permission in writing from the publishers. Trademark notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. British Library Cataloguing-in-Publication ­ ­​­­ ​­ Data A catalogue record for this book is available from the British Library ISBN: 978-1-032-49287-2 ­ ­​­­ ­​­­ ­​­­ ​­ (hbk) ­ ISBN: 978-1-032-49289-6 ­ ­​­­ ­​­­ ­​­­ ​­ (pbk) ­ ISBN: 978-1-003-39297-2 ­ ­​­­ ­​­­ ­​­­ ​­ (ebk) ­ DOI: 10.4324/9781003392972 ­ Typeset in Times New Roman by codeMantra

CONTENTS

Acknowledgements List of contributors

xi xii

Introduction: the origins and importance of experimental linguistics PART I

Focus on linguistic domains

1

5

1 Historical perspectives on the use of experimental methods in linguistics Alan Garnham

7

2 Experimental phonetics and phonology Ivana Didirková and Anne Catherine Simon

21

3 Experimental morphology Vera Heyer

38

4 Experimental syntax Yamila Sevilla and María Elina Sánchez

51

5 Experimental semantics Stephanie Solt

71

6 Experimental pragmatics Tomoko Matsui

85

vii

Contents



viii

Contents

19 Analysing language using brain imaging Trisha Thomas, Francesca Pesciarelli, Clara D. Martin and Sendy Caffarra

299

20 New directions in statistical analysis for experimental linguistics Shravan Vasishth

313

21 Assessing adult linguistic competence Lydia White

330

22 Dealing with participant variability in experimental linguistics Ute Gabriel and Pascal Gygax

345

23 Testing in the lab and testing through the web Jonathan D. Kim

356

PART III

Focus on specific populations

373

24 Experimental methods to study child language Titia Benders, Nan Xu Rattanasone, Rosalind Thornton, Iris Mulders and Loes Koring

375

25 Experimental methods to study atypical language development Phaedra Royle, Émilie Courteau and Marie Pourquié

390

26 Experimental methods to study disorders of language production in adults Andrea Marini

407

27 Experimental methods for studying second language learners Alan Juffs and Shaohua Fang

422

28 Experimental methods to study bilinguals Ramesh Kumar Mishra and Seema Prasad

440

29 Experimental methods to study cultural differences in linguistics Evangelia Adamou

458

ix

Contents

30 Experimental methods to study late-life language learning Merel Keijzer, Jelle Brouwer, Floor van den Berg and Mara van der Ploeg

473

Index

486

x

ACKNOWLEDGEMENTS

First and foremost, our gratitude goes toward Robert Avery, who has done a remarkable job helping us with all the nitty-gritty details of editing this handbook. Robert, we couldn’t have done it without you, you’re the best editing assistant ever! We would also like to express our gratitude to Routledge editors, especially Nadia Seemungal Owen for having the idea of this handbook in the first place, then for trusting us with the editing job and most of all for her great support at the initial stages of the project. Thanks as well to Eleni Steck and Bex Hume for taking over the editing job and helping us with the final stages. A big thank you is also in order to all the wonderful contributors of this handbook, for all the interesting interactions we have had about their chapter and the important insights and great work they provided for the book. Finally, we also thank the numerous colleagues who have suggested names of contributors, and more generally provided advice and support throughout the project. We hope this new resource will be helpful to you all!

xi

CONTRIBUTORS

Evangelia Adamou is a senior researcher at the French National Centre for Scientific Research. Her research focuses on language contact and bilingualism based on corpus and experimental data from endangered languages. She is the author of several books on the subject and has edited with Yaron Matras the Handbook of Language Contact. https://orcid.org/0000-0002-6653-5070 Caroline Andrews holds a PhD in Linguistics from the University of Massachusetts Amherst and is currently a postdoctoral researcher at the University of Zurich. Her research has a special focus on populations and linguistic structures which have been historically underrepresented in psycholinguistic theory, including in the Peruvian Amazon. https://orcid.org/0000-0001-8681-9823 Jennifer Arnold is a Professor in the Department of Psychology and Neuroscience at the University of North Carolina at Chapel Hill. Her research examines language production and comprehension in discourse contexts, in particular pronoun production and comprehension. https://orcid.org/0000-0001-7519-1305 Titia Benders is Assistant Professor at the University of Amsterdam’s Amsterdam Center for Language and Communication. She studies children’s language acquisition at the PhoneticsPhonology interface, with a focus on infant-directed speech and the relationship between perception and production. https://orcid.org/0000-0003-0143-2182 Geert Brône is professor of linguistics and vice dean of research policy at the University of Leuven. His research focuses on cognitive and interactional approaches to multimodal meaning construction in language, using a variety of empirical methods. He is the founder and co-manager of the KU Leuven Mobile Eye-Tracking Lab. https://orcid.org/0000-0002-4725-7933 xii

Contributors

Jelle Brouwer is a PhD candidate in the Bilingualism and Aging Lab (BALAB). His research focuses on the effects of late-life language learning on well-being and cognition in seniors with depression and mood problems. https://orcid.org/0000-0001-5440-8608 Sendy Caffarra is an Assistant Professor at the University of Modena and Reggio Emila (Italy) and a visiting scholar at Stanford University (USA). She is interested in how experience shapes our brain language network. http://orcid.org/0000-0003-3667-5061 Peter J. Collins is a Senior Lecturer at the Greenwich University. After studying linguistics and psychology, he obtained his PhD in Psychological Sciences. Since, he has completed a postdoctoral fellowship in Munich and a teaching fellowship at Goldsmiths. His research focuses on the role of natural-language pragmatics in rational behaviour. https://orcid.org/0000-0003-4831-2524 Anne E. Cook is a Professor at the University of Utah. Her current empirical and theoretical research focuses on memory-based processes involved in text comprehension. She has expertise in the use of eye-tracking methodology, which she uses to gain a finer-grained analysis into the processes that underlie comprehension. https://orcid.org/0000-0002-0733-6370 Émilie Courteau is a postdoctoral researcher at Dalhousie University and a clinical speechlanguage pathologist. Her current research focuses on the neurocognition of language processing in teenagers with developmental language disorder and in parental support of literacy development in preschool children. https://orcid.org/0000-0001-6264-5080 Ivana Didirková is an Associate Professor of Phonetics and Phonology at the Université Paris 8 Vincennes – Saint-Denis. In her research, she is particularly interested in stuttered speech production, articulatory phonetics and disfluency in speech and discourse. https://orcid.org/0000-0001-8107-2361 William Dupont currently has a teaching and research assistantship at the UFR STAPS of Dijon. His research is conducted within the INSERM U1093 Lab (CAPS; University of Bourgogne). His research employs behavioural and neurophysiological methodologies to study the relationship between the motor system, action language and motor imagery. https://orcid.org/0000-0002-6932-2628 Irina Elgort is an Associate Professor in Higher Education at Victoria University of Wellington. Her research interests include vocabulary learning and processing, bilingualism and reading. She

xiii

Contributors

uses research paradigms from applied linguistics, cognitive psychology and education to better understand, predict and influence second language learning. https://orcid.org/0000-0002-4568-9951 Shaohua Fang is a PhD candidate in linguistics in the Department of Linguistics, University of Pittsburgh. His research interest lies in second language acquisition and sentence processing. He has published on the L2 acquisition of Chinese temporality and L2 predictive processing. https://orcid.org/0000-0002-3342-5764 Ute Gabriel is a Professor of Social Psychology in the Department of Psychology at the Norwegian University of Science and Technology (NTNU), Trondheim, Norway. Her research interests include stereotypes and gender representation in language. https://orcid.org/0000-0001-6360-4969 Alan Garnham is Professor of Experimental Psychology in the School of Psychology at the University of Sussex. His research interests are in psycholinguistics and include mental models theory, anaphor, inference, and stereotypes and emotion, both as they impact on language processing. https://orcid.org/0000-0002-0058-403X Nathalie Giroud is a neuropsychologist and an Assistant Professor at the University of Zurich. She leads the Computational Neuroscience of Speech & Hearing research group that studies the neural and cognitive underpinnings of speech and language, focusing on clinical populations who have difficulties to process and understand spoken language. https://orcid.org/0000-0002-9632-5795 Pascal Gygax is a Senior Lecturer of Psycholinguistics at the University of Fribourg, Switzerland. His interests include the way readers go beyond text to form coherent mental models. With Ute Gabriel, they have published papers on the social and cognitive mechanisms that force us to see the world through androcentric lenses. https://orcid.org/0000-0003-4151-8255 Ulrike Hahn is a professor at Birkbeck College, University of London, and director of the Centre for Cognition, Computation and Modelling. Her focus has been human rationality. Recently, she has focused on the role of perceived source reliability for human beliefs, especially as parts of larger communicative social networks. https://orcid.org/ 0000-0002-7744-8589 Vera Heyer is an Associate Professor at TU Braunschweig, Germany. Her current research focuses on the processing of morphologically complex words in native and non-native speakers as well as the morphology-orthography interface. https://orcid.org/0000-0002-0982-2591 ­ ­­ ­​­­ ­​­­ ​­

xiv

Contributors

Jet Hoek is an Assistant Professor at Radboud University Nijmegen. Her research focuses on discourse coherence, studying both relational and referential coherence from a cognitive perspective, from both offline and online methodologies. She is especially interested in how discourse expectations influence processing and shape the linguistic realisation of a discourse. https://orcid.org/0000-0002-4430-0430 Alan Juffs is a Professor in the Department of Linguistics, University of Pittsburgh. He was the Director of the English Language Institute at the University of Pittsburgh from 1998 to 2020. He has published books on the lexicon, sentence processing and language development in Intensive English Programs. https://orcid.org/0000-0002-4741-6412 Merel Keijzer is a Professor of English Linguistics and English as a second language at the University of Groningen and the head of the Bilingualism and Aging Lab (BALAB). Her research interests focus on (the effects of) bilingual experiences across the lifespan, with a focus on older adulthood. https://orcid.org/0000-0001-9041-8563 Jonathan D. Kim is a postgraduate research fellow in Psychology at the Norwegian University of Science and Technology. Their current research focus is on social perception, especially as it relates to gender ratios and gender stereotypical beliefs, and on testing long-standing methodological assumptions found in linguistic and psycholinguistic research. https://orcid.org/0000-0002-7926-6834 Loes Koring is a Lecturer at Macquarie University. She focuses on the processing and acquisition of human language syntax and semantics. Current projects include the syntax and semantics of posture verbs, children’s acquisition of argument structure, children’s production of negative questions, scope assignment and children’s understanding of sentences with quantifiers. https://orcid.org/0000-0001-8564-7441 Erez Levon is a Professor of Sociolinguistic and Director of the Center for the Study of Language and Society at the University of Bern. Using quantitative, qualitative and experimental methods, his work focuses on how people produce and perceive socially meaningful patterns of variation in language. https://orcid.org/ 0000-0003-1060-7060 Carol ­Madden-Lombardi ​­ is currently working at the University of Bourgogne (INSERUM U1093 Lab). She employs behavioural and neuroscience methodologies to investigate the embodied and modality-specific nature of language representations; showing how we use language cues to activate appropriate meanings, and how these resulting representations mirror our real ­perceptual-motor ​­ experience. https://orcid.org/0000-0003-1856-5416

xv

Contributors

Andrea Marini is Associate Professor at the University of Udine where he leads the Language Lab. His research focuses on the neuropsychology of language in both adults and children, the analysis of the relationship between language and cognitive functioning, cognitive neuroscience of bilingualism, phylogenetic evolution and ontogenetic development of language. https://orcid.org/0000-0002-6058-3864 Clara D. Martin is an Ikerbasque Research Professor, leader of the ‘Speech & Bilingualism’ research group, at the Basque Center on Cognition, Brain and Language, San Sebastian, Spain. Her research interests are on speech and language perception and production, with a specific focus on bilingualism. http://orcid.org/0000-0003-2701-5045 Tomoko Matsui is a professor at Graduate School of Letters, Chuo University in Tokyo, Japan. She focuses on psychological mechanisms involved in utterance interpretations. Recently, she has been researching developmental pragmatics; exploring interactions between language, social cognition and culture. Her publications include Bridging and Relevance (John Benjamins, 2000; Ichikawa Prize winner). https://orcid.org/0000-0001-9638-0674 Ramesh Kumar Mishra teaches at the University of Hyderabad (India). His professional expertise are attention, visual processing, bilingualism and language processing, literacy and its influence on cognition. His book Bilingualism and Cognitive Control was recently published (2018) and he is the ­Editor-in-Chief of International Journal of Cultural Cognitive Science. ­​­­ ​­ https://orcid.org/0000-0001-8862-7745 Iris Mulders is the Head of the Institute for Language Sciences Labs and an Assistant Professor at the Department of Languages, Literature and Communication, Utrecht University. In her research, she uses eye tracking to answer questions about sentence processing and the syntax-semantics interface. https://orcid.org/0000-0002-3695-9059 Elisabeth Norcliffe is a Marie Skɫodowska-Curie senior research fellow at the University of Oxford. Her research lies at the intersection of psycholinguistics and linguistic typology. https://orcid.org/0000-0001-8646-6474 Bert Oben is professor at the university of Leuven and currently head of the MIDI research group. He teaches in the Linguistics and Business Communication programmes and his research is focused on multimodal analyses of interactional phenomena such as conversational humour, copying behaviour or foreigner talk. https://orcid.org/0000-0002-7022-9367 Edward J. O’Brien is a Professor Emeritus at the University of New Hampshire. His research focuses on the development and testing of models that capture memory-based processes during xvi

Contributors

reading comprehension. Such processes involve: (re)activation of information in memory, necessary and elaborative inferences, the interaction of context and general world knowledge. https://orcid.org/0000-0002-2502-4418 Francesca Pesciarelli is an Associate Professor in Cognitive Psychology at the University of Modena and Reggio Emilia, Italy. Her research focuses on the implicit processing of language comprehension. http://orcid.org/0000-0003-3007-4775 Marie Pourquié is a researcher in Applied Psycholinguistics at the IKER-CNRS research Centre, in Baiona, Basque country (France). Her current research focuses on language assessment tool development in Basque and in characterizing (a)typical language processing in Basque-French and in Basque-Spanish bilingual speakers with and without language disorders. https://orcid.org/0000-0002-6159-6106 ­ ­­ ­​­­ ­​­­ ​­ Seema Prasad is an Alexander von Humboldt research fellow at TU Dresden examining the role of motivational factors (e.g., curiosity) on attention control. Previous work focused on unconscious attention and the extent to which it can be controlled through goal-driven factors. Recent interests include bilingualism and language-vision interaction. https://orcid.org/0000-0001-7438-1051 Nan Xu Rattanasone is a senior research fellow, Deputy Director of the Child Language Lab, CoDirector of the Centre for Language Sciences, and member of the Multilingual Research Centre, at Macquarie University. Her research focuses on language acquisition and early literacy skills in diverse populations, especially bilinguals and the deaf and hard of hearing. https://orcid.org/0000-0002-2916-8435 Phaedra Royle is a Full Professor and the Head of the Speech-language pathology program at the School of Speech-language pathology and audiology at the University of Montreal. Her current research focuses on the neurocognition of language in first and second language learners of French with and without developmental language disorder. https://orcid.org/0000-0001-8657-2162 María Elina Sánchez obtained her PhD in Linguistics at the Universidad de Buenos Aires, where she also teaches neurolinguistics. She is a researcher at the Instituto de Lingüística, Consejo Nacional de Investigaciones Científicas y Técnicas (Argentina). Her research focuses on morphosyntactic processing in healthy adults and people with aphasia. https://orcid.org/ 0000-0001-6159-8366 Ted J. M. Sanders is a Full Professor of at Utrecht University. He published widely on discourse processing, representation and on the (cross-) linguistics of coherence, striving for converging evidence by combining empirical methods: from corpus annotation to eye-tracking experiments. He has a strong interest in improving communication through comprehensible texts. https://orcid.org/ 0000-0001-8212-7336 xvii

Contributors

Sayaka Sato is a postdoctoral researcher in Psychology at the University of Fribourg, Switzerland. Her research focuses on bilingual language processing and linguistic relativity, examining topics relating to grammatical and conceptual gender. https://orcid.org/0000-0002-1406-0560 Sebastian Sauppe is a postdoctoral researcher in the Department of Comparative Language Science at the University of Zurich. His research focuses on how grammatical diversity and language processing interact. https://orcid.org/0000-0001-8670-8197 Merel C. J. Scholman is an Assistant Professor at Utrecht University and a researcher at the Department of Language Science and Technology at Saarland University. She is interested in cognitive models of language understanding, focusing on discourse coherence. Using a combination of on/offline methodologies, she investigates the interpretation and processing of discourse. https://orcid.org/0000-0002-0223-8464 Sandra Schwab is a phonetician and Maître d’enseignement et de recherche at the University of Bern. Her scientific interests are at the juncture of phonetics and experimental psycholinguistics. Her primary research focus is prosody in L1 and L2 with a specialization on lexical stress perception and speech rate. https://orcid.org/0000-0003-4485-8335 Yamila Sevilla is a Professor of Psycholinguistics and Neurolinguistics at the Universidad de Buenos Aires and an independent researcher at the Instituto de Lingüística, Consejo Nacional de Investigaciones Científicas y Técnicas (Argentina). She studies structure-building processes in sentence comprehension and production across the lifespan in healthy people and in aphasia. https://orcid.org/0000-0002-4544-6212 Anne Catherine Simon is Full Professor of French Linguistics at the University of Louvain. Her research interests cover prosody and spoken syntax within the frame of corpus and experimental linguistics, with a special emphasis on discourse units, regional prosody and speaking styles. https://orcid.org/0000-0002-7202-842X Anna ­Siyanova-Chanturia is a Chair Professor in Linguistics and Applied Linguistics at Ocean ​­ University of China, China and Associate Professor in Applied Linguistics at Te Herenga Waka – Victoria University of Wellington, New Zealand. https://orcid.org/0000-0002-8336-8569 Stephanie Solt is a Senior Researcher at the Leibniz-Zentrum Allgemeine Sprachwissenschaft (ZAS). Solt’s research focuses on how natural languages encode scalar meaning, and how speakers choose to communicate concepts of quantity and degree. She has an ongoing interest in the application of experimental techniques to questions in theoretical semantics. https://orcid.org/0000-0002-3269-8461 xviii

Contributors

Michael K. Tanenhaus is the Beverly Petterson Bishop and Charles W. Bishop Professor of Brain and Cognitive Sciences (Emeritus) at the University of Rochester. His research field includes psycholinguistics, cognitive psychology and science. His lab pioneered development and use of the visual world paradigm to explore real-time spoken language comprehension. Trisha Thomas is a PhD student at the Basque Center on Cognition, Brain and Language and the University of the Basque Country, approaching the degree of Doctor of Philosophy in Cognitive Neuroscience. Her research focuses on how interlocutor identity affects information processing and memory. https://orcid.org/ 0000-0001-8323-1012 Rosalind Thornton is an Honorary Professor at Macquarie University. She has invested in the methodologies of elicited production and the truth value judgment task as reliable methods to study children’s acquisition of syntax. Her investigations include wh-questions, sentential constraints on pronouns and names, VP-ellipsis, negation, morphosyntax, early focus and quantification. https://orcid.org/0000-0003-2854-3720 Floor van den Berg is a PhD candidate in the Bilingualism and Aging Lab (BALAB). Her research investigates how late-life language learning affects cognitive functioning and well-being in healthy older adults and older adults with subjective cognitive impairment and mild cognitive impairment. https://orcid.org/0000-0002-9626-1293 Mara van der Ploeg is a PhD candidate in the Bilingualism and Aging Lab (BALAB). Her work focuses on late-life language learning and includes classroom interaction, the identification of language learning needs and the potential for cognitive benefits arising from language learning in older adulthood. https://orcid.org/0000-0001-6303-3449 Norbert Vanek is a Senior Lecturer at the School of Cultures, Languages and Linguistics, University of Auckland, New Zealand. His main research interests are bilingual cognition, linguistic relativity, event processing and open science. https://orcid.org/0000-0002-7805-184X Shravan Vasishth is a Professor of linguistics at the University of Potsdam, Germany. His research focuses on computational models of sentence comprehension processes. https://orcid.org/0000-0003-2027-1994 Yipu Wei is an Assistant Professor at Peking University, School of Chinese as a Second Language. Her current research interest focuses on discourse processing, perspective-taking and language comprehension in visual world. https://orcid.org/0000-0002-0128-4098 Lydia White (Ph.D. McGill) is James McGill Professor Emeritus of Linguistics at McGill University and a Fellow of the Royal Society of Canada. She pioneered applying the generative linguistic xix

Contributors

framework to L2 acquisition. Her research investigates the effects of universal principles and L1 transfer on L2 linguistic competence. https://orcid.org/0000-0003-4694-5950 Sandrine Zufferey is a Professor of French linguistics at the University of Bern (Switzerland). Her work focuses on psycholinguistics approaches to study the acquisition and processing of discourse phenomena, especially discourse connectives. She is interested in analysing the impact of cross-linguistic differences in linguistic encoding for cognitive processes. https://orcid.org/0000-0002-5403-6709

xx

INTRODUCTION The origins and importance of experimental linguistics

Over the past decades, the use of experimental methods has become generalised in all domains of linguistics. Historically, this trend has been sparked by the growing need to examine causal relationships and by the idea that language users are an inexhaustible source of knowledge to understand how languages are produced, understood and used during social interactions, and more generally to evaluate linguistic hypotheses. This idea, as discussed in the first chapter of this handbook, is different to that of more traditional linguistics, for example, in the tradition of generative grammar, where linguistic hypotheses were essentially documented, or fed, by individual linguists or other language specialists whose aim was to assess speakers’ internal language (see e.g., Smith & Allott, 2016). Once multiple “naïve” informants were recognised as reliable sources to inform linguistic theories, systematic and rigorous methods of investigation became needed. These methods evolved twofold. On one hand, linguists, across domains of the study of language, such as language acquisition, sociolinguistics and language typology, have always had to integrate outside sources of evidence because they could neither rely on their own intuitions to study other languages or varieties from their own, nor could they rely on their own memories to analyse the processes of language acquisition. Researchers in these disciplines have, therefore, been pioneers regarding the use of empirical methods in linguistics. Yet, these empirical methods have not been and are not always experimental. Some of them, like corpus linguistics, rely on data observation rather than on the manipulation of variables to evaluate their effects on other variables, as in experimental linguistics. While these observational methods represent valuable sources of evidence, they rely on different paradigms from those of experimental research, and the underlying methods are dealt with in dedicated handbooks (for example, O’Keeffe & McCarthy, 2022 for corpus linguistics). On the other hand, experimental methods have a long tradition in psychological sciences – maybe more specifically in cognitive sciences – and they have been used in the field of psycholinguistics since the second half of the twentieth century. While psycholinguistics focuses more specifically on the cognitive processes underlying language production and comprehension as well as first and second language acquisition (Altmann, 2001), linguists from other disciplines have come to integrate experimental methods more and more into their research to investigate a broader range of research questions. Some of which are even rooted in psychological sciences, for example: determining what attitudes people hold towards different accents in their mother tongue in sociolinguistics, how people understand implicit (or indirect) language uses such as metaphors,

1

DOI: 10.4324/9781003392972-1

The origins and importance of experimental linguistics

irony and scalar implicatures in pragmatics, or the impact of gestures for linguistic representations in the field of cognitive linguistics. In a nutshell, experimental methods currently represent an important foundation for research in all areas of linguistics and are used to assess linguistic representations on a broad range of different populations. The incorporation of experimental methods across fields in linguistics has come to be known as experimental linguistics. Yet, these experimental methods themselves have developed exponentially over the past decades, ranging from offline studies involving the analysis of speakers’ judgements of linguistic materials to online studies measuring reading comprehension or even accessing brain activations during language processing. The use of each of these methods, even the seemingly simplest ones in terms of materials and procedures, require adequate knowledge, and these skills have become an integrant part of the linguists’ toolkit. These methods have been introduced in recent textbooks (e.g., Gillioz & Zufferey, 2021), but accessible and more in-depth resources are still scarce, yet strongly needed. Our aim with this handbook is to contribute to filling this gap, by providing advanced students and researchers with a wide-ranging resource, containing chapters written by authoritative scholars in their field, to help them design, conduct and analyse experimental data that will reliably nourish and develop linguistic theories. To achieve this goal, the 30 chapters from the handbook have been categorised into three parts. The first part (Chapters 1–11) contains chapters dedicated to presenting the use of experimental methods from the perspective of different fields of linguistics, covering a wide array of linguistic domains where experimental approaches have proven to be particularly well suited, including both those that have a long(er) tradition of using experimental methods such as phonetics and discourse comprehension, and those in which the use of experiments is still emerging and quickly evolving, such as syntax, semantics and pragmatics. This overview takes a broad perspective, covering domains that are not systematically included in linguistic research, such as argumentation and studies focusing on embodiment phenomena. This unified presentation encourages advanced students and researchers form an in-depth and wide-encompassing view of the variety of methods and findings in experimental linguistics. Although the materials presented in this part of the handbook do not represent an exhaustive view of experimental linguistics, it nonetheless goes beyond the habitual boundaries imposed by focusing on one or two specific domains, thus enabling cross-references between related domains. For example, readers interested in experimental typology might also benefit from reading about the experimental methods that can be used to study populations from minority languages, as well as the specific practical and ethical concerns related to the inclusion of these populations. The second part of the handbook (Chapters 12–23) provides a detailed presentation of the leading methods used in experimental linguistics, ranging from classical methods examining natural language productions and language comprehension, such as elicitation and judgment tasks, to more innovative ones such as eye tracking or brain imaging. These methods are presented along with a wide range of linguistic questions such as those pertaining to language perception and comprehension, reading, and social interactions and categorisations. In the handbook chapters, the main methods in experimental linguistics are critically discussed, and a selection of studies in which they have been used are presented, with the aim to illustrate their major advantages and challenges. Across the chapters in this second part of the handbook, the authors also discuss important methodological considerations for experimental designs, such as the need to control for individual and social factors when recruiting participants for linguistics studies, as well as the implications of testing participants in a laboratory or, as is becoming more and more common, online via the

2

The origins and importance of experimental linguistics

internet. Special attention is also dedicated to appropriately identify the statistical tools that are most suited for the different types of data gathered in experimental linguistics, and to embrace the transparency needed for appropriate replication and reproducibility (open science, data and materials sharing, etc.). These chapters aim to compel readers to think critically about important methodological and statistical choices that can potentially impact study design, data collection and data analyses. Importantly though, most chapters implicitly or explicitly stress a fundamental principle of experimental linguistics: although there is a large panel of experimental methods available, the choice of method(s) depends on the questions asked. At times, even if some methods seem very attractive – for example, because they are associated with modern technology – they may not be suitable for the questions at hand. And of course, not all research questions can suitably be addressed with a single experimental approach. For example, reading times during reading comprehension can adequately be measured using the simple method of self-paced reading, and the more complex eye-tracking paradigm is necessary if researchers have hypotheses that cannot be answered using these global reading times, and if some of the data gathered with the eye-tracker (regressions, number of fixations, etc.) are truly informative of the cognitive mechanisms under investigation. In other words, a simpler method should always be preferred when it can adequately answer a research question. Methodological complexity for its own sake does not serve the purpose of experimental research, and researchers often face complex decisions that they did (unfortunately) not foresee when using over complex designs and research methods. As extensively discussed across the different chapters, none of the methods are suitable for all linguistic questions, and special attention always needs to be allocated to correctly interpret what the data tell us (and what they don’t tell us), and to correctly identify noise factors. Most chapters offer some hands-on recommendations, and lists of further readings are also provided at the end for advanced students and researchers who want to develop more hands-on skills with a given method. At times (i.e., Chapter 20 on new trends in statistical analysis for experimental linguistics), links are even given for free online tutorials. The third part of the handbook (Chapters 24–30) is dedicated to the use of experimental methods across a variety of different populations and language contexts. The chapters in this part of the handbook emphasise the requirements and methodological challenges raised when testing particular populations such as young children or older learners, participants speaking more than one language, or people pertaining to different cultural contexts that are not accustomed to be included in experimental research. Each chapter includes a detailed presentation of the experimental methods that are most adequate for each population, as well as concrete advice to meet their specific needs. As it is the case for the choice of an appropriate experimental method, choosing the right population for one’s research questions is extremely important, as no population is suited for all questions. Note that in many of the chapters, authors also warn about the now well-known WEIRD sample bias (i.e., participants that represent the Western, educated, industrialised, rich, democratic societies) and its impact on the generalisability of the data. Taken together, the chapters of the Routledge Handbook of Experimental Linguistics provide theoretical ground, methodological and practical guidance for advanced students and researchers involved in experimental linguistics and eager to contribute to linguistic theories with new empirical data. By presenting examples and applications across many linguistic domains, the handbook aims at leading them to think critically about design, data collection and data analyses. More generally, we hope that with the Routledge Handbook of Experimental Linguistics, advanced students

3

The origins and importance of experimental linguistics

and researchers will be led to carefully choose the methods that are best suited for their research questions, keeping in mind that new methods are also likely to emerge in the next decades in this rapidly expanding and exciting new field of study.

References Altmann, G. (2001). The language machine: Psycholinguistics in review. British Journal of Psychology, 92, 129–170. ­ ​­ Gillioz, C., & Zufferey, S. (2021). Introduction of Experimental Linguistics. ISTE-Wiley. ­ ​­ O’Keeffe, A., & McCarthy, M. (2022). The Routledge Handbook of Corpus Linguistics. 2nd edition. Routledge. Smith, N., & Allott, N. (2016). Chomsky: Ideas and Ideals. 3rd edition. Cambridge University Press.

4

PART I

Focus on linguistic domains

1 HISTORICAL PERSPECTIVES ON THE USE OF EXPERIMENTAL METHODS IN LINGUISTICS Alan Garnham

1.1

Introduction

According to most modern definitions, linguistics is the scientific study of language. It must, therefore, use empirical methods with observations that provide data that can be used to test the predictions of linguistic hypotheses. To what extent do these methods include experimental methods, and what is the history of the use of such methods as applied to language? Relatedly, to what extent can the study of language be differentiated from the study of how it is processed (psycholinguistics), how the brain is involved in such processing (neurolinguistics), and how it is acquired (child language)? Experimental methods have been extensively used in these fields. And they all involve, at least in a broad sense – the one adopted in this Handbook – the study of language. Historically, an interest in describing languages can be traced back several thousand years. Work on Sumerian cuneiform in Mesopotamia was carried out in the 3rd and 2nd millennia BCE. And as the reference to cuneiform indicates, this interest was partly inspired by and partly made possible by the invention of writing systems. Attempts to standardise writing systems led to consideration of what the symbols of those systems represented. Interest in these matters was driven by different goals in different contexts. For example, the correct recitation of sacred texts was important in India. In ancient Greece, by contrast, it was the study of persuasive argument. The methods were empirical, as far as observations could be used to challenge their conclusions. However, they were largely observational and seem to have depended, though in a less systematic way than in 20th-century generative linguistics, on the responses of individual scholars to the languages they were studying (what generativists would call “linguistic intuitions”). Quantification was generally absent. Of course, at various times, as in many domains of investigation, supposedly legitimate empirical methods have been used to support entirely erroneous ideas about language and languages, for example, that some languages, and by implication their users, are more primitive than others. But such misuse of scientific methods does not invalidate them. Most of the history of the study of language occurred before the modern notion of scientific investigation was developed, and, in that sense, it was not applicable. In the Middle Ages, grammar, along with logic and rhetoric, formed the trivium, the lower division of the seven liberal arts. It was only in the 19th century that scientific developments in medicine and physics were applied to the analysis of languages; particularly, in the first instance, they were used to answer questions

7

DOI: 10.4324/9781003392972-3

Alan Garnham

in phonetics. Acoustic properties of speech sounds and their methods of articulation proved amenable to experimental investigation, but work in the third branch of phonetics, auditory phonetics, resisted instrumentation as the human ear was regarded as the best detector of the auditory properties of speech sounds. It was in the parallel discipline of speech perception, which developed in psychology and became established much more recently, that experimental approaches to the auditory properties of speech sounds eventually flourished. More generally, the use of experimental methods was introduced into Western culture by Francis Bacon in the 16th century. However, many of his arguments for experimentation, or more generally for inductive methods and against deductive methods of scientific investigation, were foreshadowed in the 11th century in the Book of Optics by the Arabic scholar Ibn al-Haytham (also known as Alhazen). Following Alhazen, experimental methodology was first applied in the physical sciences, physics, and chemistry, before being imported into studies in biology and medicine. The study of the physics of light and sound led to an interest in related aspects of language. Spoken language is the primary form of language and questions about the nature of speech sounds, how they are articulated and how they are perceived, naturally arose. Visual language is derivative, but one set of questions that soon came to be asked was about the nature of eye movements in reading. Many of these studies made use of newly developed scientific instruments that made quantitative measurement possible.

1.2

Early work on speech sounds

Experimental studies of the physiology of speech production – what is now known as articulatory phonetics – began in Germany, though (self-)observational studies of the position of the tongue in articulating vowels date back to at least Robert Robinson’s 1617 “The Art of Pronunciation” (see Dobson, 1947). In the 19th century, Ernst Wilhelm Brücke (1856) used an instrument called the labioscope to measure movements of the lips during speaking. A further development was the laryngoscope for investigating the movement of the glottis, first indirectly, using mirrors (e.g., Garcia, 1855, and later in Germany by Czermak, Merkel and others) and then directly by insertion of the instrument into the mouth (by Alfred Kirstein in 1895, see Hirsch, Smith, and Hirsch, 1986). Garcia discovered that the vocal cords are the primary source of speech sound and the work in Germany influenced British phoneticians, in particular Henry Sweet, who was the (partial) inspiration for Henry Higgins in George Bernard Shaw’s play “Pygmalion”. The study of the properties of speech sounds – acoustic phonetics – was pioneered by Helmholtz and others (e.g., Von Helmoltz, 1870). In this domain, an important early piece of apparatus that could record speech sounds was the phonautograph. The phonautograph was a development of the kymograph of Carl Ludwig (1847), which was originally used to monitor blood pressure. The phonautograph itself was designed by Édouard-Léon Scott de Martinville (1857) and constructed in a version with a cylinder for recording sound by Rudolph Koenig. It could provide a visual record of a speech sound on smoked-blackened paper or glass, but it did not allow playback (though a method has recently been developed to play sounds from a phonautograph recording – see firstsounds.org). The phonautograph used an elastic membrane to mimic the tympanic membrane in the ear and transferred the sounds impinging on the membrane to the paper via levers and a stylus. Sounds were spoken into a horn that directed the vibrations in the air onto the membrane. Although some aspects of the sounds could be identified from the visual trace, the process proved more difficult that Scott had anticipated and, indeed, reading sounds by eye from visual records remains difficult to this day. Following the history of recorded sound, in 1877, Thomas Edison produced the tinfoil cylinder phonograph, in which the stylus engraved a trace of the recording 8

The use of experimental methods in linguistics

onto the cylinder, which could then be played back by running a stylus through it. Edison’s invention was quickly followed by the more enduring wax cylinder versions first produced by Alexander Graham Bell and other members of his Volta Laboratory (Chichester Bell, and Charles Sumner Tainter). Such devices were soon used for the capture and playback of speech, music and other sounds but did not provide a visual record of the kind that lent itself to the analysis of sounds. In France, Charles Cros also realised that playback of a phonautograph trace would be possible if it were made more “substantial”. In 1877, he described a device similar to Edison’s phonograph, the paleophone, to the French Academy of Sciences, but he does not appear to have had a working model. This line of work fed, primarily, into the entertainment industry, and by the late 1880s Emile Berliner had developed flat discs that were the forerunners of the gramophone records used for many years in the music industry, but which were of limited importance in experimental linguistics. A more important development from the phonautograph for that discipline was the Rousselot cylinder (Rousselot, 1897), which, like the phonautograph, produced a visual record of sounds on smoked paper. A crucial change was that, as well as a speech signal, the Rousselot cylinder could produce a record of tongue movements by inserting a rubber bulb between the tongue and the roof of the speaker’s mouth. The importance of the Rousselot cylinder and of Rousselot’s work with it is reflected by the fact that Rousselot is regarded as one of the prime candidates for the title of the founder of experimental phonetics. However, perhaps the most important technical development for the scientific study of spoken language, the sound spectrograph came much later, in the 1940s at Bell Labs (Koenig et al., 1946). This machine produces a detailed record of the distribution of energy in a sound wave in both frequency and time. In recent years, the original part-mechanical machines of the 1940s and 1950s have largely been replaced by computerised analysis. As interesting as the early work in Germany and other countries was, it was subject to several limitations. Many of these derived from the fact that it was mainly carried out by physiologists and physicists, not linguists. These scientists often had a limited understanding of the niceties of language. They described their results mainly in physical or anatomical terms and missed interesting linguistic conclusions that could be drawn from the work they had carried out (see, e.g., Sweet, 1877). Sweet, who had many connections with Germany, but also a sound background in language and linguistics, is another candidate for the founder of experimental phonetics.

1.3

Early work on brain damage and language

Another line of work in the empirical study of language that developed within 19th-century medical science was the study of language impairments resulting from brain damage of various kinds, primarily strokes and head injuries. The thrust of this work was to locate areas of brain damage, via autopsy, in patients who had been identified, while alive, as having language problems. This work was taken to support ideas about the localisation of brain function for language. Paul Broca (e.g., 1861) made this idea popular, but it had been presented decades earlier, using the same kind of evidence, by Paul Dax (1863, 1865), who died shortly after his presentation of his ideas, and without proper publication. Despite efforts by his son, Gustave (1863) to persuade the French Academy of Sciences, Dax’s contribution was sidelined and has been largely overlooked until recently (e.g., Cubelli & Montagna, 1994; Finger & Roe, 1996). Broca provided evidence for left lateralisation and localisation of language in most adults, but it was left to Karl Wernicke (1874) to distinguish, anatomically, between the kind of productive aphasia studied by Broca and receptive aphasia of the kind named after Wernicke himself. Brain damage in these two cases was primarily located in the first case in the pars opercularis and pars triangularis of the inferior frontal gyrus 9

Alan Garnham

(Brodmann areas 44 and 45) and in the second case in the superior temporal gyrus (Brodmann area 22) of the dominant (usually left) hemisphere. In the aphasia literature, these areas are known as Broca’s area and Wernicke’s area, respectively. Until the advent of modern brain-imaging techniques, the post-mortem anatomical techniques of Broca, Wernicke and others were widely used in aphasia research.

1.4

Early work on eye movements in reading

The history of eye movement research, including eye movements in reading, also took off in the 19th century and has been well documented elsewhere (e.g., Wade, 2010). Empirically derived knowledge of eye functioning and eye anatomy goes back at least as far as Aristotle (4th century BCE) and was taken forward considerably by Galen (2nd–3rd century CE). However, the non-smooth nature of most eye movements did not become a subject of interest until the 18th century when nystagmus (movement of the eyes that is involuntary, rapid and repetitive, and can be horizontal, vertical or rotational) associated with vertigo was first studied in detail by François Boissier de Sauvages de Lacroix, Wells and later Purkinje. Finally, it was recognised (e.g., Crum Brown, 1895, but based on earlier work) that “jerky” eye movements (“saccades”) were the norm in looking around the world. Further support for this idea came from a related line of research (e.g., Hering, 1879) in which sounds observed at the eye during reading and other activities, and associated with muscular activity, were found to be discontinuous and related to the changing position of the eye in its socket. Lamare (1892), working in Javal’s lab, used a similar technique, and demonstrated the basic pattern of eye movements in reading English (and other left-to-right) texts – mainly left-to-right along a single line of text, and then sweeping back to near the beginning of the next line. Wade (2010) suggests that Javal’s own role in initiating the study of eye movements in reading has been exaggerated, perhaps on the basis of a statement by Huey in his ‘Psychology and Pedagogy of Reading’ (1908), which fails to mention that the work was carried out by Lamare and does not mention Hering et al. Wade (2010) also suggests that other methods of recording eye movements, tried in Javal’s lab and often mentioned in the literature, for example, fitting a suction cup with a pointer to the eye, did not work. Javal did, however, introduce the now standard term, “saccade”, for the jerky eye movements referred to above. Huey and others did have later success with suction cups attached directly to the eyeball, but the techniques were not only dangerous but also introduced potential distortions to eye movements, for example, because the eye had been anaesthetised and because of the additional weight of the suction cup to be moved by the eye muscles. Non-contact methods were quickly sought and devised, using mirrors (e.g., Orschansky, 1899) or photographic techniques by Dodge (see e.g., Dodge & Cline, 1901; Erdmann & Dodge, 1898). Such techniques made eye movement research much safer, and it was soon discovered that the pattern of saccades and fixations (the times where the eye is still between saccades) are typical of how we look around the world, though not, of course, the specific horizontal and vertical pattern found in reading.

1.5

Early psychological research on language

While scientific approaches to speech sounds developed throughout the 19th century (and indeed possibly earlier), it was with the emergence of psychology as a science in the late 19th century, and with language as an obvious focus of psychological research, that experimental approaches to many aspects of language first took hold. However, to some extent psychologists ask different questions from linguists – roughly about language use rather than about language itself – and, 10

The use of experimental methods in linguistics

indeed, a debate has raged, at times more violently than others, about whether psychological research can reveal anything about language itself. As just noted, psychologists focus on language use. In the early days of psychology, de Saussure’s distinction between “langue” and “parole” was seen as relevant to the (restricted) scope of psychology, and the later, related distinction in generative grammar between competence and performance was (eventually) used by Chomsky to exclude experimental psychological data from informing linguistic theory. In other traditions, the distinction between psychological and linguistic approaches has been less clear-cut. Karl Buhler (1934), one of the founders of experimental work in psychology in the German laboratories, including in psychology of language, also carried out significant work on linguistic theory. More recently, cognitive or performance grammars, often presented in the Anglo-Saxon tradition as an alternative to Chomskyan approaches, blur the distinction between competence and performance. However, they are not necessarily associated with experimental approaches to questions about grammar, and rarely if at all with standard psychological approaches. Finally, work on language acquisition, even when strongly influenced by generative linguistics, is often conducted via experiments or related empirical techniques, as linguistic intuitions of children are unavailable or unreliable. To return to early experimental work on language, an early development in the 1880s in Wundt’s Leipzig psychology laboratory, was that of the “gravity chronometer” by James Cattell for the brief presentation of visual stimuli. It was used to study, among other things, aspects of reading. Wundt himself referred to the gravity chronometer as a “Fall-Tachistoskop”. This name indicated its relationship to an instrument first introduced in physiology by Volkmann (1859) and known in the English-speaking world as the tachistoscope. The tachistoscope is used for the study of vision under conditions of a very brief presentation of the stimulus. Volkmann’s tachistoscope was intended to replace illumination with a brief electric spark, which was unsatisfactory for various reasons. Early tachistoscopes differed considerably in appearance from later ones in which lights and mirrors played an important role. As the name gravity chronometer suggests, the brief presentation in Cattell’s device was produced by a slit in a board falling under gravity. This kind of shuttering mechanism continued to be used in some later tachistoscopes. In others, particularly where more accurate timing was wanted, flashtubes with very precisely known characteristics (very short rise and fall times) were used in a manner somewhat reminiscent of the electric sparks that tachistoscopes had replaced. The early use of tachistoscopes in psychological research provided a link to experimental techniques used in a more established science, physiology, and was welcomed as bolstering the scientific status of the psychological work carried out in the Leipzig laboratory. The tachistoscope was one of several types of apparatus introduced into psychological laboratories in the late part of the 19th century and the early part of the 20th century (see, e.g., Evans, 2000, for more information). However, the advent of behaviourism, and the increasing focus on research on animals in psychology, gave rise to a period, at least in the Anglo-Saxon tradition, in which language was relatively neglected. Bloomfield’s (1933) embrace of behaviourism within linguistics had little impact in psychology, given the divorce between the two disciplines heralded by, for example, Delbrück (1901).

1.6 The cognitive revolution In the Anglo-Saxon world, interest in the psychology of language revived with the cognitive revolution of the 1950s. The approach within psychology was largely experimental, and to some extent observational in the study of child language. This approach was complemented by computer modelling in the neighbouring discipline that came to be known as artificial intelligence (AI). AI along 11

Alan Garnham

with psychology and linguistics was one of the cognitive sciences of the 1970s and 1980s. The AI of this period, sometimes referred to as GOFAI (Good Old-Fashioned Artificial intelligence) should not be confused with its successor, the artificial intelligence of today, based on artificial neural networks and machine-learning algorithms. In the study of language, some of the main achievements of GOFAI include Terry Winograd’s (1972) programme for understanding natural language (also known as SHRDLU) and Ross Quillian’s (1968) Semantic Networks. Quillian’s work demonstrates the strong interplay between the various cognitive sciences, as it led to experimental tests of his ideas in collaboration with the psychologist Allan Collins (Collins & Quillian, 1969; 1970). However, as already noted, despite the obvious connection between linguistic theory and the psychology of language, the predominant linguistic approach of the cognitive period, generative grammar, with its strong focus on syntax, and to a lesser extent semantics, became cautious, to say the least, about allowing psychological data to impact on linguistics theory. Psychology was about the use of language (“performance”) not language itself, or the knowledge of it (“competence”).

1.7

“New” experimental methods

In the 1950s and 1960s, much psychological work, though clearly experimental in nature, used simple techniques, such as pencil and paper tasks. However, it was clear that psychological theories should be able to make predictions about how long certain processes should take, at least relative to one another. Therefore, there was a need for studies that looked at what happened as language was being used, which came to be known as “on-line” studies. It is possible to study both language production and language comprehension online, but in many ways, it is easier to study comprehension, and hence, most psychological studies have been on comprehension.

1.7.1

Online processing

In the 1960s, techniques to study online processing were developed, and from the 1970s on, experiments in which processing times were reported and analysed began to proliferate. Two techniques from the 1960s were “click location” and “phoneme monitoring” (or monitoring for other linguistic units). In click location (e.g., Fodor & Bever, 1965), a click is heard while a sentence is being played to participants. Afterwards, participants must indicate where in the sentence the click occurred. The perceived position of a click may be displaced towards a boundary between units posited in a linguistic analysis and that displacement provides evidence for the psychological reality of those units. The phoneme-monitoring task was introduced by Don Foss (Foss & Lynch, 1969) and was considered more versatile than click location. In this task, participants respond to the occurrence of a specified phoneme, usually at the beginning of a word, in the speech stream they are listening to. Response time is taken as an indication of the difficulty of processing at that point in the sentence. It has already been noted that processing time measurements were possible in the late 19th century, in particular using tachistoscopic techniques. Such techniques were reintroduced into the psychology of language after the cognitive revolution. For example, the famous Haviland and Clark (1974) study on inference-making and the given-new distinction (“We checked the picnic supplies. The beer was warm”) used a “modified tachistoscope” (Haviland and Clark, 1974, p. 515). Other researchers started to use computers for presenting stimuli and timing responses. For example, Rubenstein, Lewis and Rubenstein (1971), in a study famously re-analysed by Clark (1973) in his paper on the language-as-fixed-effect fallacy, presented stimuli on “the cathode ray tube of an IBM

12

The use of experimental methods in linguistics

1800 computer” (Rubenstein et al., 1971, p. 646). Another well-known experiment controlled by a (relatively large) so-called minicomputer was Garrod and Sanford’s (1977) study of anaphoric reference. In this study, passages were printed by a teletype attached to a NOVA 2/10 computer and made visible to participants through a slit by presses on the space bar. These presses advanced the paper roll on the teletype. Response time and reaction time techniques have also been widely used in the study of lexical and sub-lexical processing. Haviland and Clark’s (1974) study used pairs of sentences, with the first presented for a fixed time in the tachistoscope and the second until the participant pressed a button. In Garrod and Sanford’s (1977) study, the time that each of two sentences and a following question was visible through the slit was controlled by the participant’s button presses. That is the basis of the self-paced reading technique, first described in detail (in French) by Joel Pynte (1974, 1975). The technique was introduced into the English-speaking literature by, in addition to Garrod and Sanford (1977), Aaronson and Scarborough (1976) and Mitchell and Green (1978). Aaronson and Scarborough presented sentences one word at a time on a “computer scope” with participants pressing a “response key” (1976, p. 57). Self-paced reading is not limited to the case where text is divided into sentences. It also comes in several variants depending on what happens to previous and upcoming text when the participant presses a button or a key (see, e.g., Mitchell, 2004). The advent of microcomputers, and related software, revolutionised the running of reaction, response and reading time (including self-paced reading) experiments. Our own lab acquired an SWTPC microcomputer with a Motorola 6800 microprocessor in the late 1970s. The experiment reported in Garnham (1981) was run using this machine with a custom-written programme (by the author) in RT basic. Within a year we had switched to similar 6809-based systems running programmes written in C cross-compiled on and downloaded from a DEC PDP11 running Unix (Norris, 1984). Experiments using this system are reported in Garnham (1984) and several later papers from our lab. Once microcomputers morphed into everyday desktop machines, software began to appear for common types of machines. For example, Schneider (1988) produced Micro Experimental Laboratory (MEL), commercial software for PC-type computers, which was the precursor of the much better-known E-Prime (Schneider, Eschman, & Zuccolotto, 2012), and MacWhinney and colleagues produced PsyScope, free software for MACs (Cohen et al., 1993). Another important piece of free software, in this case for PCs, was Ken Forster’s DMASTR/DMDX, in the development of which he was later helped by his son, Jonathan (Forster & Forster, 2003). More recently, software for running experiments with millisecond timing online has become available. Among others are the free software systems PsychoPy (Peirce et al., 2019) and PsyToolkit (Stoet, 2017).

1.7.2

Eye movements

Microcomputers also helped to revolutionise the use of eye-movement techniques in psycholinguistic research. We have already seen that there was extensive research on the nature of eye movements in reading in the late 19th and early 20th centuries. More important nowadays is the use of eye-tracking measures as dependent variables to investigate hypotheses about issues such as parsing (e.g., Ferreira and Clifton, 1986) or referential processing (e.g., Altmann & Kamide, 1999). Two papers published in the Journal of the Optical Society in 1960 (Rashbass, 1960; Smith & Warter, 1960) described techniques for measuring (particularly horizontal) eye movements by reflecting light from different parts of the eye surface, such as the iris and the sclera, which have different reflective properties. Typically, infrared light is used, as light in the visible range could interfere with what is seen. In the 1960s and early 1970s, such systems had to be interfaced with

13

Alan Garnham

cumbersome and relatively expensive minicomputers. Nevertheless, an important development (Rayner, 1975; Reder, 1973) was the introduction of gaze-contingent techniques, in which the presented stimulus could be (very rapidly) changed depending on the measured position of the eye during a fixation. Ideally, the change took place during the following saccade, when vision is much reduced, and the occurrence of the change would not be noticed by the participant. However, the fact that the change had occurred might affect subsequent processing, compared with a “no change” condition. By the 1980s, regular desktop microcomputers could be used to control and record data from eye-tracking devices. In language research, video-based techniques, successors to the photographic methods originally used by Dodge (e.g., Dodge & Cline, 1901; Erdmann & Dodge, 1898), were developed via the use of motion picture cameras (Fitts et al., 1950). In the 1970s and 1980s, they were used in some studies, though their temporal resolution was limited by the number of frames per second recorded by the video device. Other methods had to be devised to produce more accurate spatial information about movements of the eyes in the head and better temporal resolution. These characteristics were essential when people were reading substantial portions of text in relatively small fonts. Methods based on Purkinje images were developed at the Stanford Research Institute (Cornsweet & Crane, 1973, Crane & Steele, 1985). A commercial eye-tracker developed by a spinoff company, Fourward Technologies, was the most commonly used Purkinje tracker in language research. Purkinje images are formed when light is reflected from boundaries between different layers of the transparent part of the eye, which includes the lens. When the eye moves, differential movement of the Purkinje images can be used to compute changing eye position. Dual Purkinje trackers, such as the ones made by Fourward Technologies, use two of the Purkinje images. They were popular in language research for about 20 years, but they are relatively difficult to maintain and the Fourward Technologies model required the head to be fixed (e.g., with a relatively uncomfortable bite bar) to determine accurately the direction of gaze in space. More recently, techniques using infrared reflection from the surface of the eye have seen a major revival, with EyeLink machines manufactured by SR Research being the most popular in language research. These systems track the pupil (and measure its size, which can be used in pupillometry) by using different reflecting properties of the pupil and the surrounding iris. Initially, these systems were combined with head-mounted trackers, first introduced by Hartridge and Thompson (1948). Head mounting means that the direction of gaze relative to the head is directly measured, and not its direction in space. Hence, they typically require the head to be fixed to allow the computation of gaze direction in space. More recently, simultaneous measurement of eye and head movements has become accurate enough to allow the direction of gaze in space to be computed without fixing the head (though large or rapid movements of the head are not easily compensated for). Freehead desk-mounted systems are now the norm, except when very accurate information about eye movements is required, as they are more comfortable for participants. A further development is that miniaturisation allows portable eye tracking, for monitoring eye movements when people are moving around carrying out everyday activities. That said, pioneering portable eye trackers were introduced in the 1990s though their deployment was difficult and data analysis time-consuming (see, e.g., Land & Tatler, 2009). Another important development in eye-tracking research on language has been the introduction of the so-called visual world paradigm, in which people view a set of pictures (or occasionally a set of real-world objects) as they listen to speech. Shifts of attention from one picture to another indicate how quickly listeners are processing what they hear. The technique is based on one introduced by Cooper (1974). It was revived in psycholinguistic research, again using microcomputers and modern software, by Tanenhaus et al. (1995). 14

The use of experimental methods in linguistics

1.7.3

Brain imaging and related techniques

Further advances in the experimental study of language came from the development of techniques for registering brain activity, including so-called brain-imaging techniques. Electroencephalography (EEG), the recording of electrical activity on the scalp, is not an imaging technique, but the generation and analysis of event-related potentials within electroencephalograms has become an important technique in the study of language. Electrical activity on the surface of exposed animal brains was observed in the 19th century, but it was not until the 1920s that an electroencephalogram was recorded from the human scalp by Hans Berger (1929), who coined the term “electroencephalogram”. Berger also noted that the waves in an electroencephalogram were affected by external events, which led to the notion of an event-related potential (ERP). ERPs are often weak and only detectable by averaging electrical activity over many trials of the same kind. The first ERPs in an awake human were reported by Pauline Davis (1939) with similar effects during sleep reported by Davis and colleagues in the same year (H. Davis et al., 1939). In the 1960s, Grey Walter reported the first cognitive ERPs (Walter et al., 1964). There followed an era of the discovery and description of ERP components, positive or negative going, and indexed either by their ordinal position (e.g., P3, the third positive-going component) or their approximate peak time in milliseconds after the external event (e.g., N400, a negative component around 400 ms after the event). Microcomputers and comparatively easy-to-use software led to a dramatic growth of ERP work on language from the 1980s. A key event was the publication of Kutas and Hillyard’s (1980) paper on the effects of the semantic anomaly on the N400 component, where the N400 was considerably larger at “socks” than at “butter” in “She spread the bread with socks/butter”. Related to EEG is MEG (magnetoencephalography), which analyses the magnetic component of the same electromagnetic activity on the scalp that EEG analyses. MEG typically uses detectors that operate at very low temperatures (superconducting quantum interference devices – SQUIDs), and is, therefore, considerably more expensive than EEG. Both EEG and MEG are said to have a high temporal resolution, meaning that aspects of ERP components (e.g., peaks) can be related precisely in time to external events. This linking can then shed light on the nature of those events and their relations to other types of events, on the assumption that similarities in ERPs reflect similarity in (the processes of responding to) the events themselves. Of the imaging techniques, the most important in language research is functional Magnetic Resonance Imaging (fMRI). Other methodologies include Positron Emission Tomography (PET), NIRS (Near Infra-Red Spectroscopy, including functional NIRS – fNIRS), and SPECT (SinglePhoton Emission Computed Tomography). Nuclear Magnetic Resonance (NMR) has been studied in physics since the 1930s. Its use in medical imaging was developed in the early 1970s by Paul Lauterbur (1973) and Peter Mansfield (e.g., Mansfield & Grannell, 1975), who were awarded the Nobel Prize in Physiology or Medicine for this work. In its initial medical applications, MRI produced detailed structural images of internal body parts to help with diagnosis. fMRI combines NMR with what has been known since the late 19th century about blood flow and blood oxygenation during mental activity. The most commonly used measure is the BOLD (Blood-oxygen-level dependent) contrast (Ogawa et al., 1990). Oxygenated blood flows to active neurons to provide oxygen to support their activity. The oxygenated and de-oxygenated forms of haemoglobin have different magnetic properties and hence produce differing image patterns, hence the BOLD contrast in the image. Brain-imaging techniques provide relatively accurate information about where something is happening in the brain, but changes detected by the main techniques – including fMRI – are fairly sluggish. They cannot provide detailed information about millisecond-tomillisecond processing, which in linguistic research is often important. 15

Alan Garnham

1.7.4

Dialogue and discourse

Dialogue and discourse are somewhat neglected aspects of language, at least as far as their experimental study is concerned. There have been recent attempts to compare and relate brain activity in different participants in a discourse, but these remain in their infancy. Earlier work used various experimental “games” to study coordination, such as the tangram game (Clark & Wilkes-Gibbs, 1986), and the maze game (Garrod & Anderson, 1987). In a different line of research, studies of syntactic priming and related research led to the notion of alignment between participants in dialogue at many linguistic levels (e.g., speech rate, vocabulary, and syntactic structure; Pickering & Garrod, 2004). Other, broader, questions about contextual effects on language, not necessarily in dialogue, can be addressed in experimental sociolinguistics (see, e.g., Drager, 2018). Its techniques are mainly relatively non-technical psychological ones.

1.8 

‘Experimental ­ linguistics’

Syntax and semantics are sometimes regarded, perhaps unfairly, as the core domains of linguistics. In these domains, the notion of experimental approaches has only recently received much attention, and then in a rather narrow sense (see, e.g., Cowart, 1997; Cummins and Katsos, 2019; Goodall, 2021; Schütze, 1996). Furthermore, it has been a matter of some controversy (e.g., Jacobson, 2018). Traditionally, grammar writers relied on their own knowledge of a language. However sometimes, particularly when studying unfamiliar languages from different language groups, they had to elicit judgements from native informants. In the generative tradition that dominated academic linguistics for most of the second half of the 20th century, Chomsky’s notion of linguistic intuition was used to justify the production and testing of syntactic hypotheses by individual linguists. Initially, Chomsky appeared to show interest in results from the emerging field of psycholinguistics, and ideas such as the derivational theory of complexity. However, he soon took to using his distinction between competence and performance to exclude psychological data from informing linguistic theory. The basic premise of more recent claims about the need for experimental syntax and semantics is that multiple judgements (“intuitions”) are likely to be more informative than the judgements of a single person, and that such judgements should be collected systematically in a way that can be called experimental. Without additional manipulations or participant groups, it is not entirely clear that the term “experimental” is appropriate for this method of data collection. Furthermore, as Jacobson (2018) and others have argued, it is not more experimental than “collecting” carefully elicited judgements from a single participant (often oneself) or a small group of participants. It has advantages – and may even be essential – where judgements are dubious or variable (and it may not always be obvious to a single speaker when they are not). However, there have been many areas of science in which large numbers of data points and associated statistical analyses are not required. To revert to syntax and semantics, ‘Colourless green ideas sleep furiously’ can clearly be seen to have no sensible literal meaning by a native speaker of English, but it also clearly can be assigned a structure that parallels that of sensible sentences such as “Tiny white mice run quickly”. Judgements from 10 or 100 or 1,000 people are not necessarily useful in this case. In relation to these considerations, the study of context-dependent aspects of meaning (pragmatics) is of some interest. As Levinson (1983, 9) points out, “pragmatics” is a rag-bag term that covers some context-dependent aspects of meaning that are closely tied to linguistic structure and some that are not. So, for example, the analysis of indexicals (“I”, “you”, “here”, “there”, “now”, “then”, etc.) often uses methods similar to those of formal semantics, and often, but not 16

The use of experimental methods in linguistics

always, is able to rely on robust judgements about the use of such terms. In other areas of pragmatics, such as presupposition and implicature, judgements (and developmental trajectories) are less obvious, and a subdiscipline of experimental pragmatics (Meibauer & Steinbach, 2011; Noveck, 2018a; Noveck & Sperber, 2004) has grown up with methods similar to those used in psycholinguistics, even if the relation between the two disciplines is sometimes fraught. Core areas of study in experimental pragmatics include scalar implicature, speaker perspective and reference-making, and irony (see Noveck, 2018b).

1.9

Conclusion

Language comprises a complex and varied set of phenomena. Its empirical, and indeed scientific, study, therefore, has many facets. Some aspects of language, and in particular its sounds, lay themselves open to scientific measurement and experimentation of a kind that is familiar to the natural sciences. In addition, aspects of language acquisition and language use repay study using the techniques of psychology and related social and cognitive sciences. Even syntax and semantics, considered as the core areas of language study by particularly generative linguists, and that often seem less susceptible to experimentation, are now seeing growing interest in the potential use of experiments for their study.

Further reading Evans, R. B. (2000). Psychological Instruments at the Turn of the Century, American Psychologist, 55(3), ­ ­322–325. ​­ Mitchell, D. C. (2004). On-line Methods in Language Processing: Introduction and Historical Review. In M. Carreiras & C. E. Clifton (Eds.), The On-Line Study of Sentence Comprehension: Eyetracking, ERP and ­­  ­15–32). ​­ Beyond (pp. New York: Routledge.

References Aaronson, D., & Scarborough, H. S. (1976). Performances Theories for Sentence Coding: Some Quantitative ­ ​­ Evidence, Journal of Experimental Psychology: Human Perception and Performance, 2, 56–70. Altmann, G. T. M., & Kamide, Y. (1999). Incremental Interpretation at Verbs: Restricting the Domain of ­ ­247–264. ​­ Subsequent Reference, Cognition, 73(3), Berger, H. (1929). Über das Elektroenkephalogramm des Menschen, Archiv für Psychiatrie und Nervenk­ ­527–570. ​­ rankheiten, 87(1), Bloomfield, L. (1933). Language. New York: Holt, Rinehart and Winston. Broca, P. (1861). Remarques sur le Siege de la Faculté du Langage Articulé, Suivies d’une Observation D’Aphémie, Bulletin de la Société Anatomique de Paris, 36, 330–357. ­ ​­ Brucke, E. W. (1856). Grundzüge der Physiologie und Systematik der Sprachlaute für Linguisten und Taubstummenlehrer. C. Gerold und Sohn. Bühler, K. (1934). Sprachtheorie. Fischer. Clark, H. H. (1973). ­ The ­Language-as-Fixed-Effect ­​­­ ­​­­ ​­ Fallacy: A Critique of Language Statistics in Psychological Research, Journal of Verbal Learning and Verbal Behavior, 12(4), ­ ­335–359. ​­ Clark, H. H., & Wilkes-Gibbs, D. (1986). Referring as a Collaborative Process, Cognition, 22(1), ­ 1–39. ­ ​­ Cohen, J., MacWhinney, B., Flatt, M., & Provost, J. (1993). PsyScope: A New Graphic Interactive Environment for Designing Psychology Experiments’, Behavior Research Methods, Instruments, and Computers, 25(2), ­ 257–271. ­ ​­ Collins, A. M., & Quillian, M. R. (1969). Retrieval Time from Semantic Memory, Journal of Verbal Learning and Verbal Behavior, 8(2), ­ ­240–247. ​­ Collins, A. M., & Quillian, M. R. (1970). Does Category Size Affect Categorization Time?, Journal of Verbal Learning and Verbal Behavior, 9(4), ­ 432–438. ­ ​­

17

Alan Garnham Cooper, R. M. (1974). The Control of Eye Fixation by the Meaning of Spoken Language: A New Methodology for the Real-Time Investigation of Speech Perception, Memory, and Language Processing, Cognitive Psychology, 6, 84−107. Cornsweet, T. N., & Crane, H. D. (1973). Accurate Two-Dimensional Eye Tracker using First and Fourth ­ ­921–928. ​­ Purkinje Images, Journal of the Optical Society of America, 63(8), Cowart, W. (1997). ­ Experimental Syntax: Applying Objective Methods to Sentence Judgments. Sage. Crane, H. D.,  & Steele, C. M. (1985). ­ Generation-V ­ ​­ ­dual-Purkinje-Image ­​­­ ​­ Eyetracker, Applied Optics, 24(4), ­ ­527–537. ​­ Crum Brown, A. (1895). The Relation between the Movements of the Eyes and the Movements of the Head. Henry Frowde. Cubelli, R., & Montagna, C. G. (1994). A Reappraisal of the Controversy of Dax and Broca, Journal of the History of Neuroscience, 3(4), ­ ­215–226. ​­ Cummins, C., & Katsos, N. (Eds.). (2019). The Oxford Handbook of Experimental Semantics and Pragmatics. Oxford University Press. Davis, P. A. (1939). Effects of Acoustic Stimuli on the Waking Human Brain, Journal of Neurophysiology, 2, ­494–499. ​­ Davis, H., Davis, P. A., Loomis, A. L., Harvey, E. N., & Hobart, G. (1939). Electrical Reactions of the Human ​­ Brain to Auditory Stimulation During Sleep, Journal of Neurophysiology, 2, ­500–514. Dax, M. (1865, lu à Montpellier en 1836). Lésions de la Moitié Gauche de L’Encéphale Coïncident avec L’Oubli des Signes de la Pensée, Bulletin hebdomadaire de médecine et de chirurgie, 2(2), ­ ­259–262. ​­ Dax, G. (1863). M. Dax Soumet au Jugement de l’Académie un Mémoire Intitulé: “Observations Tendant à Rrouver la Coïncidence Constante des Dérangements de la Parole avec une Lésion de L’Hémisphère Gauche du Cerveau”, Comptes rendus hebdomadaires des séances de l’Académie des science, 56, 536. Delbrück, B. (1901). Grundfragen der Sprachforschung; mit Rücksicht auf W. Wundt’s Sprachpsychologie. Strassburg: Trabner. Dobson, E. J. (1947). Robert Robinson and his Phonetic Transcripts of Early Seventeenth-century Pronunciation, Transactions of the Philological Society, 58, ­25–63. ​­ Dodge, R., & Cline, T. S. (1901). The Angle Velocity of Eye Movements, Psychological Review, 8, ­145–157. ​­ Drager, K. (2018). Experimental Research Methods in Sociolinguistics. Bloomsbury. Erdmann, B., & Dodge, R. (1898). Psychologische Untersuchung über das Lesen auf experimenteller Grundlage. Niemeyer. Evans, R. B. (2000). Psychological Instruments at the Turn of the Century, American Psychologist, 55(3), ­ ­322–325. ​­ Ferreira, F., & Clifton, C. (1986). The Independence of Syntactic Processing, Journal of Memory and Language, 25(3), ­ ­348–368. ​­ Finger, S., & Roe, D. (1996). Gustave Dax and the Early History of Cerebral Dominance, Archives of Neurol­ ­806–813. ​­ ogy, 53(8), Fitts, P. M., Jones, R. E., & Milton, J. L. (1950). Eye Movements of Aircraft Pilots during Instrument-Landing ­ ­24–29. ​­ Approaches, Aeronautical Engineering Review, 9(2), Fodor, J. A., & Bever, T. G. (1965). The Psychological Reality of Linguistic Segments, Journal of Verbal Learning and Verbal Behavior, 4, ­414–420. ​­ Forster, K. I., & Forster, J. (2003). DMDX: A Windows Display Program with Millisecond Accuracy, Behav​­ ior Research Methods, Instruments, and Computers, 35, ­116–124. Foss, D. J., & Lynch, R. H. (1969). Decision Processes during Sentence Comprehension: Effects of Surface Structure on Decision Times’, Perception and Psychophysics, 5, ­145–148. ​­ Garcia, M. (1855). Observations of the Human Voice, Proceedings of the Royal Society of London, 7(60), ­ ­399–410. ​­ Garnham, A. (1981). Anaphoric Reference to Instances, Instantiated and Non-instantiated Categories: A ­Reading-Time ​­ Study, British Journal of Psychology, 72, ­377–384. ​­ Garnham, A. (1984). Effects of Specificity on the Interpretation of Anaphoric Noun Phrases, Quarterly Journal of Experimental Psychology, 36A, ­1–12. ​­ Garrod, S., & Anderson, A. (1987). Saying What You Mean in Dialogue: A Study in Conceptual and Semantic ­Co-ordination, ​­ Cognition, 27, ­181–218. ​­ Garrod, S., & Sanford, A. (1977). Interpreting Anaphoric Relations: The Integration of Semantic Information While Reading, Journal of Verbal Learning & Verbal Behavior, 16, ­77–90. ​­ Goodall, G. (2021). The Cambridge Handbook of Experimental Syntax. Cambridge: Cambridge University Press.

18

The use of experimental methods in linguistics Hartridge, H., & Thompson, L. C. (1948). Methods of Investigating Eye Movements, British Journal of Ophthalmology, 32, ­581–559. ​­ Haviland, S. E., & Clark, H. H. (1974). What’s New? Acquiring New Information as a Process in Comprehension, Journal of Verbal Learning and Verbal Behavior, 13(5), ­ ­512–521. ​­ Helmholtz, H. von. (1870). Handbuch der physiologischen Optik. Leipzig: Voss. Hering, E. (1879). Über Muskelgeräusche des Auges, Sitzberichte der kaiserlichen Akademie der Wissenschaften in Wien. Mathematisch-naturwissenschaftliche Klasse, 79, ­137–154. ​­ Hirsch, N. P., Smith, G. B., & Hirsch, P. O. (1986). Alfred Kirstein: Pioneer of Direct Laryngoscopy, Anaesthesia, 41(1), ­ ­42–45. ​­ Huey, E. B. (1908). The Psychology and Pedagogy of Reading. New York: Macmillan. Jacobson, P. (2018). What is—or, for that Matter, isn’t—“Experimental” Semantics? In D. Ball & B. Rabern (Eds.), The Science of Meaning: Essays on the Metatheory of Natural Language Semantics (pp. ­ ­­  ­46–72). ​­ Oxford Scholarship Online. Koenig, W., Dunn, H. K., & Lacy, L. Y. (1946). The Sound Spectrograph, Journal of the Acoustical Society of America, 18, ­19–49. ​­ Kutas, M., & Hillyard, S. A. (1980). Reading Senseless Sentences: Brain Potentials Reflect Semantic Incongruity, Science, 207, ­203–205. ​­ Lamare, M. (1892). Des Mouvements des Yeux dans la Lecture, Bulletins et Mémoires de la Société Française d’Ophthalmologie, 10, ­354–364. ​­ Land, M. F., & Tatler, B. W. (2009). Looking and Acting: Vision and Eye Movements in Natural Behaviour. Oxford University Press. Lauterbur, P. C. (1973). Image Formation by Induced Local Interactions: Examples Employing Nuclear Magnetic Resonance, Nature, 242, ­190–191. ​­ Levinson, S. C. (1983). Pragmatics. Cambridge University Press. Ludwig, C. F. W. (1847). Beiträge zur Kenntniss des Einflusses der Respirationsbewegungen auf den Blutlauf im Aortensysteme, Archiv für Anatomie, Physiologie und wissenschaftliche Medizin, ­242–302. ​­ Mansfield, P., & Grannell, P. (1975). Diffraction and Microscopy in Solids and Liquids by NMR, Physical Review B, 12(9), ­ ­3618–3634. ​­ Meibauer, J., & Steinbach, M. (2011). Experimental Pragmatics/Semantics. John Benjamins Pub. Co. ­ Mitchell, D. C. (2004). On-line Methods in Language Processing: Introduction and Historical Review. In M. Carreiras & C. E. Clifton (Eds.), The On-lineSstudy of Sentence Comprehension: Eyetracking, ERP and Beyond (pp. Psychology Press. ­­  ­15–32). ​­ Mitchell, D. C., & Green, D. W. (1978). The Effects of Context and Content on Immediate Processing in Reading, Quarterly Journal of Experimental Psychology, 30, ­609–636. ​­ Norris, D. (1984). A Computer-Based Programmable Tachistoscope for Nonprogrammers, Behavior Research Methods Instrument and Computers, 16, ­25–27. ​­ Noveck, I. (2018a). Experimental Pragmatics: The Making of a Cognitive Science. Cambridge University Press. Noveck, I. (2018b). Experimental Pragmatics. In S. -A. Rueschmeyer & M. G. Gaskell (Eds.), The Oxford Handbook of Psycholinguistics (pp. Oxford University Press. ­­  ­622–643). ​­ Noveck, I. A., & Sperber, D. (2004). Experimental Pragmatics. Palgrave Macmillan. Ogawa, S., Lee, T. M., Kay, A. R., & Tank, D. W. (1990). Brain Magnetic Resonance Imaging with Contrast Dependent on Blood Oxygenation, Proceeding of the National Academy of Sciences U.S.A, 87(24), ­ ­9868–9872. ​­ Orschansky, J. (1899). Eine Methode die Augenbewegungen Direct zu Untersuchen, Zentralblatt für Physiologie, 12, ­785–785. ​­ Peirce, J. W., Gray, J. R., Simpson, S., MacAskill, M. R., Höchenberger, R., Sogo, H., Kastman, E., & Lindeløv, J. (2019). PsychoPy2: Experiments in Behavior Made Easy. Behavior Research Methods, 51(1), ­ ­195–203. ​­ Pickering, M. J., & Garrod, S. (2004). Toward a Mechanistic Psychology of Dialogue, Behavioral and Brain Sciences, 27(2), ­ ­169–225. ​­ Pynte, J. (1974). Une Expérience Automatisée en Psycholinguistique. Communication au G.A.I.N., Paris, Décembre, 1974, Informatique et Sciences Humaines, 23, ­45–46. ​­ Pynte, J. (1975). Programmation d’une Expérience de Psycholinguistique. Communication aux Journées de la S.F.P. sur L’expérimentation Pilotée par Ordinateur, Aix, Décembre, 1975, Cahiers de Psychologie, 18, ­65–74. ​­

19

Alan Garnham Quillian, M. R. (1968). Semantic Memory. In M. Minsky (Ed.), Semantic Information Processing ­(pp. ­­  ­­227–​ 270). MIT Press. Rashbass, C. (1960). New Method for Recording Eye-Movements, Journal of the Optical Society of America, 50, ­642–644. ​­ Rayner, K. (1975). The Perceptual Span and Perceptual Cues in Reading, Cognitive Psychology, 7, ­65–81. ​­ Reder, S. M. (1973). On-Line Monitoring of Eye Position Signals in Contingent and Non-contingent Paradigms, Behavior Research Methods and Instrumentation, 5, ­218–228. ​­ Rousselot, J. -P. ​­ (1897). ­ Principes de Phonétique Expérimentale (2 volumes). H. Welter. Rubenstein, H., Lewis, S. S., & Rubenstein, M. A. (1971). Evidence for Phonemic Recoding in Visual Word Recognition, Journal of Verbal Learning and Verbal Behavior, 10(6), ­ ­645–657. ​­ Scott de Martinville, É. -L. (1857). Fixation Graphique de la Voix par Édouard-Léon Scott de Martinville. J. Claye. Schneider, W. (1988). Micro Experimental Laboratory: An Integrated System for IBM PC Compatibles, Behavior Research Methods,Instruments, and Computers, 20, ­206–217. ​­ Schneider, W., Eschman, A., & Zuccolotto, A. (2012). ­E-Prime ​­ User’s Guide. Pittsburgh: Psychology Software Tools, Inc. Schütze, C. T. (1996). The Empirical Base of Linguistics: Grammaticality Judgments and Linguistic Methodology. University of Chicago Press. Smith, W. M., & Warter, P. J., Jr. (1960). Eye-Movement and Stimulus Movement: New Photoelectric Electromechanical System for Recording and Measuring Tracking Motions of the Eye. Journal of the Optical Society of America, 50, ­245–250. ​­ Stoet, G. (2017). PsyToolkit: A Novel Web-Based Method for Running Online Questionnaires and Reaction­Time Experiments. Teaching of Psychology, 44, ­24–31. ​­ Sweet, H. (1877). A Handbook of Phonetics: Including a Popular Exposition of the Principles of Spelling Reform. Clarenden Press. Tanenhaus, M. K, Spivey-Knowlton, M. J., Eberhard, K. M., & Sedivy, J. C. (1995) Integration of Visual and ​­ Linguistic Information in Spoken Language Comprehension, Science, 268, ­1632–1634. Volkmann, A. W. (1859). “Das Tachistoskop....” Berichte iiber die Verhandlungen der Koniglich Sachsischen ​­ ​­ Gesellschaft der Wissenschaften zu Leipzig, ­Mathematisch-Physische Classe, 11, ­90–98. Wade, N. J. (2010). Pioneers of Eye Movement Research, iPerception, 1(2), ­ ­33–68. ​­ Walter, W. G., Cooper, R., Aldridge, V. J., McCallum, W. C., & Winter, A. L. (1964). Contingent Negative Variation: An Electric Sign of Sensorimotor Association and Expectancy in the Human Brain, Nature, 203(4943), ­ ­380–384. ​­ Wernicke, C. (1874). Der Aphasische Symptomencomplex. Eine Psychologische Studie auf Anatomischer Basis. Cohn and Weigert. Winograd, T. (1972). Understanding Natural Language, Cognitive Psychology, 3(1), ­ ­1–191. ​­

20

2 EXPERIMENTAL PHONETICS AND PHONOLOGY Ivana Didirková and Anne Catherine Simon

2.1

Introduction and definitions

Experimental phonetics and phonology aim to understand the processes of speech production and perception. As stated by Ohala, experimentation is an attitude: first, a keen awareness that the world is not necessarily as it may seem, i.e., that our senseimpressions and therefore the opinions and beliefs based on them may be faulty, and, second, the willingness to actively do something to compensate for or correct these potential errors by making observations under carefully controlled conditions. (Ohala, 1987, p. 207) The experimental approach is characterised by the effort to control for factors that could potentially confound their effects on the phenomenon one seeks to explain, model, or predict (Xu, 2011, p. 88). An experimental study is based on a hypothesis (e.g., the higher-pitched a voice is, the more female-like it is perceived) that can be tested by manipulating variables (e.g., the mean pitch of the voice, but also intonation contours or the timbre). The testability of the hypothesis depends on the identification of the variables to be observed and the reliability of the measurements of these variables. The reliability of the measurements often requires the use of instrumental acoustic analysis (Hayward, 2000, p. 5). However, accurate acoustic analysis is not sufficient if the hypotheses and variables are not properly identified. That is the reason why experimental phonology cannot be reduced to the biophysical aspects of speech sounds and their computerised analysis. Hypothesis testing crucially relies on related fields of knowledge, depending on the research questions: the explanation of phonological phenomena of contrast and categorisation requires a cognitive dimension (Demolin, 2012); psychological models are needed when investigating language acquisition and development; sociolinguistics allows for studying how we categorise people based on voice features, etc.

21

DOI: 10.4324/9781003392972-4

Ivana Didirková and Anne Catherine Simon

2.2

Historical perspectives

As in other disciplines, the development of new techniques in the late 19th and early 20th centuries opened new horizons in speech sciences. The apparition and access to new recording machines permitted acquisitions and studies of higher amounts of data and were supposed to allow for more objectivity since a recording can be played back, unlike direct observations made before. Traditionally, it is considered that experimental phonetics started with Rousselot (1846–1924), who introduced the term in 1889 (Boë & Vilain, 2011). His Principes de phonétique expérimentale (1924) provides the basis for the discipline, explaining the necessity of a ‘control’ over subjective observations made by early descriptive phoneticians. Rousselot’s work describes acoustic and physiological parameters of human speech and provides valuable information on ‘natural’ and ‘artificial’ (Rousselot, 1924, p. 4) means of observation and experimentation. The author explains techniques such as sound recording machines (chronographs, phonautographs, phonograph, gramophone) or devices allowing for the observation of articulatory movements (artificial palate, kymograph, myograph, pneumograph, labial, lingual, velar, and laryngeal ‘exploration’). Importantly, Rousselot’s conception of phonetics was multidisciplinary: phonetics included linguistics, acoustics, and physiology. However, while Rousselot is considered the father of experimental phonetics (Hayward, 2000, p. 35), the question related to speech studies was raised long before the 20th century. For instance, in the 18th century, Darwin (1731–1802) led the first instrumental phonetic study on a live speaker to describe vowels’ place of articulation (Ohala, 2010). Another seminal text was published three years after Rousselot introduced experimental phonetics. Scripture’s Elements of Experimental Phonetics (1902) introduce what will later be considered as the three main branches of phonetics: speech acoustics (called ‘curves of speech’), speech perception, and speech production, and provides essential information on them. Experimental phonology is often seen as later than phonetics: Demolin (2012) mentions a nearly one-century delay between Rousselot and experimental or laboratory phonology apparition. Nevertheless, as for experimental phonetics, it is important to specify that, as stated by Ohala (1987), although they probably did not think of themselves as ‘experimental phonologists,’ linguists and anthropologists such as Sapir (1884–1939), Greenberg (1915–2001), or Brown (1925– 1997) did use experimental methods in some of their works. However, it is Ohala and Jaeger who contributed to the development of the discipline. In their Experimental phonology (Ohala & Jaeger, 1986, p. 2), they advocate the use of experimental methods in phonology not to create new knowledge, but instead to refine the existing knowledge and recall that experimental methods do not necessarily imply heavy equipment. The authors insist on two facts. First, many other sciences switched to experimental methods that allow for new developments in their discipline, albeit imperfect. Second, theories, often opposed to experiments, would benefit from experiments and vice versa: indeed, an experiment is often based on a theory (or model); otherwise, it would not make sense. Inversely, theories need experiments to be confirmed – or not – and refined if necessary.

2.3 2.3.1

Critical issues and topics Variability and complexity

For several decades, experimental phonetics and phonology have pursued the elaboration of a feature theory and sought generalised and straightforward ways to describe phonemic invariants in languages through articulatory or acoustic theories (Ladefoged, 2004). Invariance was a central concern until Labov’s (1972) analysis of sociolinguistic issues highlighted the social variability in sound production and perception. 22

Experimental phonetics and phonology

Today, researchers value the intrinsic and extrinsic variability of speech, including in experimental settings where variables must be controlled. Intrinsic, or structural, variability is due to different parameters (e.g., frequency of a phoneme, syllabic, or prosodic structure) that influence the realisation of phonemes. On the other hand, sources of extrinsic variation are numerous (e.g., speaker characteristics, situational context, dialect) and must be considered in production and perception experiments (Thomas, 2002). The result is a complex and multifactorial attempt to model intrinsic and extrinsic variation that has become central in many experimental studies (Kim & Tilsen, 2022; Lawson & Stuart-Smith, 2021).

2.3.2

Corpora and stylistic diversity

Corpus studies and experimental studies sometimes seem irreconcilable: the term ‘laboratory phonology’ evokes the use of elicited speech that can be far from everyday speech. However, researchers seek to produce explanatory models applicable to speech in everyday life (whether for automatic recognition, speech synthesis, speech therapy or teaching purposes). Authentic speech corpora (continuous speech, dialogue) are necessary for the results to have ecological validity. Corpus data come from natural contexts and make it possible to study issues related to stylistic diversity (Wagner et al., 2015). Using such data in experimental studies combines the advantages of both corpus and experimental methods, which are complementary. In some cases, corpus and experiment are on the same footing and the study is both observational and experimental; most of the time, corpora serve as databases from which fitting examples are retrieved (Gilquin & Gries, 2009, p. 14).

2.3.3

Multimodality

A last critical topic we briefly dwell on is the need for adopting a multimodal perspective. Multimodality first relates to the mode (spoken, written or gestural) of speech production and the interaction of auditory, visual, and gestural modes when processing speech. Scholars study the influence of written stimuli on phoneme categorisation (Dufour et al., 2022), the interplay between gestures and prosody in speech production and interpretation (de Ruiter et al., 2012; Prieto et al., 2015), or the phonology and prosody in sign languages (Brentari, 2019). A second meaning, more abstract, relates to the fact that speech can be conceived as multimodal in itself: the interplay of speech rate, breathing, register and pausing, alongside the production of accentuation, rhythm or intonation patterns, creates a complex system where the different prosodic ‘modes’ are partly autonomous and interplay with each other (Perlman & Benitez, 2008; Simon & Auchlin, 2001).

2.4

Current contributions and research 2.4.1

From typology to variation

The field of experimental phonetics and phonology is incredibly rich and covers various topics. Initially, descriptive methods used by the two disciplines were meant to understand the basic functioning of the vocal tract, such as articulatory descriptions of different speech sounds or laryngeal behaviour, both in typical and in impaired speech. Since then, the topics covered still include phonological systems descriptions but also focus on other research questions. 23

Ivana Didirková and Anne Catherine Simon

2.4.1.1

Language typology and description

Despite decades of studies aiming at speech sound description, there still are languages containing phonetic or phonological structures that are yet to be understood. Recently, a group of researchers published a book on laryngeal features in Native American languages (Avelino et al., 2016), containing phonetic and phonological studies on parameters such as tones, vowel laryngealisation, temporal coordination of glottalic gestures or consonant-tone interactions. Authors use a large scale of phonetic analyses, including acoustic or electroglottographic studies. Other researchers are interested in African languages, investigating their rhythmic patterns (Sedhu, 2015), intonation (Zerbian & Barnard, 2008), articulatory gestures (Demolin & Chabiron, 2013), and others. South American languages also gave rise to a book published a decade ago (Campbell & Grondona, 2012) and studying, among other parameters, their phonetics, and phonology. Finally, Asian language specificities are also widely studied, often regarding their tone inventories (Brunelle & Kirby, 2016; Michaud, 2012). Interestingly, besides these less endowed languages, there is still a need for a more detailed understanding of occidental languages and some of their phonetic and phonological features. For instance, Signorello et al. (2017) recently investigated the aerodynamic characteristics of Belgian French fricative consonants by measuring subglottal and intraoral air pressure. Rhoticity and its articulatory and acoustic parameters are also a subject of interest in different languages such as English (King & Ferragne, 2020), Greek (Nicolaidis & Baltazani, 2014), Slovak (Pouplier & Benus, 2009), and others, or in different configurations such as articulation problems (Van Lieshout et al., 2008).

2.4.1.2

Variation

While we do have detailed information on phonetic properties of many speech sounds, syllables, coordination, and phonological systems of languages, variation in these parameters is not scarce. On the contrary, many studies try to disentangle essential phonetic and phonological features of a sound/structure and features that can vary without preventing the correct identification. As mentioned in the previous section, variation in speech and language is caused by many factors and is considered normal. Current research on variation in speech production and perception focuses on a number of these factors. One of the active fields is inter-, and intraspeaker variation or, in other words, the way speech parameters vary not only between two speakers but also how one individual makes different speech production characteristics vary, depending – or not – on other variables. Several studies are interested in the variability of articulatory movements over time, depending on the speaker’s age and other factors (Grigos, 2009; Jacewicz et al., 2010; Tomaschek et al., 2021). One of such factors, and another important question that is still being addressed, is geographic variation. For illustration purposes, in English, the wide variety of accents gives rise to studies on speech production and perception of different English spoken varieties in comparison (or not) with comparable speech-related structures in other languages (Cebrian, 2021; Przewozny et al., 2020). Intraspeaker variation can also be due to differences in the communication setting. It is long known that formal situations require a more careful speech than casual discussions. However, many other, more detailed analyses are still led on this topic, studying the relationship between prepared and spontaneous speech, media or non-media speech, interaction degree, and others (Brognaux & Drugman, 2014; Goldman et al., 2014), both in speech perception and speech production.

24

Experimental phonetics and phonology

2.4.1.3

Pathological speech

Another important field, clinical phonetics and phonology use both perception and production paradigms to assess specificities in speech and language disorders, hearing impairments, and other disorders affecting speech and language. This subfield implies collaborations with speech and language pathologists, neurologists, otorhinolaryngologists, and other specialists. When focusing on speech and language production, studies in the clinical field often use different vocal tract visualisation tools. One of their most important aims is to identify similarities and differences between non-pathological and pathological speech behaviour, helping thus to understand not only the disorder but also the typical function of speech, improving different speech and languagerelated models and theories. Perception studies, on the other hand, focus on how disordered speech and voice can influence communication and how disorders are perceived. Studies in this field are interested in disorders such as apraxia of speech (Overby & Caspari, 2015), dysarthria (Lehner & Ziegler, 2022), stuttering (Didirková et al., 2021), or different voice disorders (Awan, 2008), and prosody impairments (Wells & Whiteside, 2008).

2.4.2

From segments to discourse

While the aforementioned studies mainly focus on phonetic and phonological parameters, scientists also use the two experimental sciences to link the speech level and a higher discourse level to help understand which voice and speech features contribute to the global sense of the message delivered by the speaker.

2.4.2.1

Prosody

Prosodic (suprasegmental) features are one of the determinant parts of spoken language and its processing. They provide information on lexical, syntactic, and discourse levels, the speaker, their emotional state, the language variety, and the speech style used, and other parameters. Literature on this subject is rich and uses many experimental methods investigating prosodic features of various spoken language structures. At the syntactic level, studies have explored the syntax-prosody interface in many languages (Dogil et al., 2002; Georgiafentis & Sfakianaki, 2004; Gras & ElviraGarcía, 2021; Hsiao, 2020; Ma & Zhuang, 2022; Michelas & D’Imperio, 2012). The same is true for the discourse-prosody interface, for which studies investigate the role of prosody in the interpretation of the discourse relation (i.e., consequence, specification, topic shift) it conveys (Didirková et al., 2019; Gonen et al., 2015; Van Praet & O’Grady, 2018), or its role in discourse structuration (Tyler, 2013). Prosody is also investigated in children and/or in various speech and hearing-related impairments, for example in inferring discourse prominence in cochlear-implant users (Huang et al., 2017), in investigating syntax-prosody interface in children with developmental language disorder (Caccia & Lorusso, 2019), in ageing (Steinhauer et al., 2010).

2.4.2.2

Disfluency

Speech production is a highly demanding process requiring the completion of several steps before speech sounds are actually released (for more information, see for example Guenther, 2016; Levelt, 1989; Shattuck-Hufnagel, 2019). This complexity often leads to hesitation phenomena, especially in unprepared speech, as the speaker has to prepare and produce their speech in real time. What is commonly experienced as a lack of words, a hesitation in the formulation, the time one

25

Ivana Didirková and Anne Catherine Simon

takes to think while producing ‘uh’ and other ‘er, well’ is commonly called disfluency and gives rise to a large number of studies in linguistics, discourse analysis, but also in speech sciences. Disfluencies are investigated in typical speech in terms of prosody (Wu et al., 2022), speaking style (Moniz et al., 2014), speech recognition (Stouten et al., 2006), discourse (Merlo & Mansur, 2004), speech production (Anansiripinyo & Onsuwan, 2019), or speech perception (Warner et al., 2022).

2.5

Main research methods

Experimental phonetics and phonology are mainly based on instruments allowing for more or less direct observations of the parameters considered. This section will list some of the tools used in ­present-day ​­ studies.

2.5.1

Speech perception studies

Speech perception is an essential part of speech since the act of speaking is generally realised with a communication purpose. This includes not only the message transmission per se but also the speaker’s age, emotional state, speaking style, and other important communication cues. Several experimental tasks are found in the literature regarding perception (Table 2.1).

2.5.1.1

Listening studies

The simplest method for a perception study consists of making participants listen to several stimuli and answer the related questions (see Table 2.1). Many platforms are available for such studies (e.g. PsychoPy, Google Form, LimeSurvey). Some of these platforms allow for measuring reaction time or performing actions like sensorimotor synchronisation with sound stimuli by finger tapping along with looped spoken phrases (Rathcke et al., 2021). Another essential advantage is the possibility to recruit participants online, which allows for a more considerable number of participants. More often, however, perception studies include the use of other instruments. The gating paradigm (Grosjean, 1980) assesses the hearer’s anticipation of the sequence to come. In other words, the hearer is repeatedly presented with a spoken stimulus, and the duration of the stimulus (from its onset) is increased at each successive presentation. The presentation time, called gate, varies according to the stimulus type (syllable, word, sentence), and the hearer is asked to indicate their expectations in terms of the sequence to come (typically one of several propositions), but also their confidence (Moradi et al., 2014). ­Table  2.1 Main experimental tasks in listening studies Categorisation and judgement tasks in listening studies (see Thomas, 2002, p. 135; Gilquin & Gries, 2014, p. 5) Identification or transcription tasks

Anticipation

(1) Judging whether two stimuli sound the same or different; (2) judging which two stimuli out of three or more are alike (ABX paradigm); (3) sorting stimuli, freely or into categories; (4) gauging the acceptability, grammaticality, naturalness … of speech stimuli (categorically or on a Likert-scale) (1) Identifying whether or not a stimulus contains a typical feature; (2) transcribing sound features (prominence, focus, prosodic boundary) using, e.g., the rapid prosodic transcription (RPT) protocol (Cole et al., 2017) Gating

26

Experimental phonetics and phonology

2.5.1.2

Stimulus preparation

Speech synthesis is widely used in speech perception studies, in order to modify natural stimuli to test one or several particular parameters (e.g. sound duration, formants) while controlling for others (see, for example, Didirková et al., 2019). Using the prosody-transplantation paradigm allows for transplanting the prosody of one speaker to a sentence produced by another one speaker, mixing segmental parameters from one variety to suprasegmental parameters from another variety (Boula & Vieru-Dimulescu, 2006). As for studying the perception of prosody, researchers frequently produce delexicalised stimuli, where segmental information is altered, so that the hearer concentrates on prosodic parameters only without being influenced by linguistic content. Such content masked speech is produced by using meaningless words or sentences (Bänziger & Scherer, 2005), by low-pass or stop-band filtering (Goldman et al., 2014), by different kinds of resynthesis trying to keep the resulting stimuli as natural as possible, that is avoiding humming (phoneme substitution Simon & Christodoulides, 2016, p. 87; inverse filtered glottal flow Vainio et al., 2009). Magnetoencephalography (MEG), electroencephalography (EEG), functional magnetic resonance imaging (fMRI), and eye tracking are sometimes used to assess perception from a physiological viewpoint. More precisely, they describe the human brain’s response to speech stimuli (Alday, 2019; Redcay et al., 2008; Wang et al., 2020). In speech production studies, multiple tools and measurements help to objectify speech production parameters. In general, phoneticians and phonologists are interested in breath, voice, and speech characteristics (Table 2.2).

2.5.1.3 Acoustic analyses One of the most obvious instrumentations is a simple voice recorder or any other device (computer, smartphone) equipped with a sound card, a microphone, and software allowing voice recording. ­Table  2.2 Measurement techniques for speech production in experimental phonetics and phonology, and their invasiveness (Yes or No) Research question

Method

Invasiveness

Acoustic analysis Voice quality

Voice recording Voice recording Electroglottography Punctures Airway interruption Intraoral air pressure Oesophageal balloon Pletysmography Pneumotachography Voice recording Electroglottography Ultrasounds Laryngoscopy Voice recording Ultrasounds Electromagnetic articulography MRI

No No No Yes No No No No No No No No Yes No No Yes Yes

Aerodynamics

Laryngeal activity

Subglottal air pressure

Airflow

Supraglottic activity

27

Ivana Didirková and Anne Catherine Simon

Audio data issued from such recordings are then processed depending on the research question. Most of the time, they are transcribed orthographically and phonemically or phonetically. Orthographic and phonemic transcriptions can be ­semi-automatised, ​­ implying different ­speech-to-text ­​­­ ​­ tools and dictionaries translating an orthographic transcription into phonemic symbols, mainly in the International Phonetic Alphabet (IPA). Transcriptions can also be done manually and should, in any case, be double-checked. In some cases, transcription does not need to be done before the next step when the recording is controlled and consists of specific sounds. Acoustic analyses can be led in computer software such as Praat (Boersma & Weenink, 2022). This computer program allows for studying several acoustic parameters: pitch (acoustic correlation of tone and intonation), intensity and loudness, duration, formant frequencies (i.e., high energy frequency peaks in the spectrum, corresponding to resonances in the vocal tract), voice quality (jitter, shimmer, harmonic-to-noise ratio), cepstral information, and others (for more information, please consult https://www.fon.hum.uva.nl/praat/). It is also possible to automatise these measures for big corpora studies.

2.5.1.4

Voice quality measures

Acoustic parameters also include other factors that may require more advanced techniques. While vocal fold vibrations can be indirectly assessed in Praat (Boersma & Weenink, 2022), other instruments allow more direct observation. Electroglottograph (EGG) provides information on vocal folds abduction and adduction by measuring variations of the contact area between vocal folds during their vibration linked to voice production (Rothenberg, 1992). It is a non-invasive method where electrodes are placed around the neck (one or two electrodes on each side of the larynx) and fixed using a collar. A low-amperage current passes between the electrodes placed on each side of the thyroid cartilage. EGG is used to study vocal folds vibration patterns with no artefacts of vocal tract or aerodynamic noise influence. More precisely, it allows to obtain values for fundamental frequency (f0), f0 variation, jitter (f0 instability measure reflecting the control of vocal folds vibration), shimmer (amplitude variation reflecting the glottal resistance), relative average perturbation (mean variation in consecutive periods), open quotient (OQ; percentage of each glottal cycle where the glottis is open), or closing quotient (CQ; ratio between the time closed and the complete glottal cycle allowing for studying glottal hypo- or hyperadduction).

2.5.1.5 Aerodynamics Voice and voice quality studies can be completed using aerodynamic air pressure and airflow data. Indeed, voice can only be produced during exhalation when the air pressure is sufficient to overcome the abduction of the vocal folds. It hence plays an essential role in speech since appropriate air pressure is necessary for voice production and, in phonetics, it allows studying phenomena such as stop consonant production. The subglottal air pressure is measured in hectopascals (hPa) or centimetres of water (cm H2O), and several techniques can be used for its estimation. The most direct procedures are invasive and consist of percutaneous, translaryngeal, or intraesophageal punctures of the cricothyroid membrane (Gross et al., 2006; Plant & Hillel, 1998) or pressure sensors inserted into the trachea (Mehta et al., 2021). Other, less direct but also less invasive techniques include interpolation from intraoral air pressure measurements (Demolin et al., 1997), airway interruption method (the subject is tasked to produce specific sound sequences while holding a tubular device measuring airflow and subglottic pressure; Smitheran & Hixon, 1981), oesophageal balloon (Lieberman, 1968), respiratory inductance plethysmography bands (Salomoni et al., 2016), and 28

Experimental phonetics and phonology

others. Regarding airflow, its measures are mostly non-invasive and help assess neuromuscular disorders of the larynx and velum and voice disorders (Jiang et al., 2016). Airflow measurement techniques primarily include pneumotachographs (sensors measuring the gaseous exchange).

2.5.1.6 Articulation In other cases, the study’s object requires analysing articulatory behaviour rather than purely vocal parameters. Two main vocal tract areas can be assessed: laryngeal and supraglottic articulatory behaviours. 2.5.1.6.1

LARYNGEAL LEVEL

Studying articulation at the laryngeal level often implies focusing on vocal folds, adductions, and abductions. When direct observations are needed, they are done using the EGG, as mentioned earlier, or a laryngeal ultrasound, which allows for studying the laryngeal height (Moisik et al., 2014). Another tool for glottal activity observation, laryngoscopy, allows for capturing videos of vocal fold movement and is widely used to assess laryngeal behaviour in singers (Lam Tang et al., 2008), voice disorders (Bonilha et al., 2012), human beatbox (Dehais Underdown et al., 2019), and others. Unlike EGG, laryngoscopy is an invasive method primarily used for medical purposes and providing direct insight into vocal fold activity. 2.5.1.6.2 ­SUPRA-LARYNGEAL ​­ LEVEL

At the supraglottic (or supra-laryngeal level), articulatory studies are interested in tongue, lips, velum, and jaw activity. As in aerodynamic studies, several tools can be used, depending on the research question. First, number of articulatory parameters can be obtained, though indirectly, in a computer program for acoustic analysis with fundamental frequency, formant frequencies, and other (spectral and cepstral) measures. When a direct observation is needed, it is possible to use ultrasounds (for tongue observation), velotrace (velum), video (lips, jaw), optotrace (visible articulators), electromagnetic articulography (lips, jaw, tongue), or magnetic resonance imaging (MRI, the most complete observation). Analysed with appropriate software, these techniques permit investigations of the supraglottic articulatory speed, velocity, gesture duration, gesture anticipation, and others.

2.6

Recommendations for practice

Protocol design is the most challenging part in an experimental study. The difficulty increases when one manipulates speech samples, which involves recording, synthesising, or manipulating sound samples, and not only their written counterpart. In this section, we provide guidance on critical steps in experimental research. As for the choice of the experimental tasks, which is a crucial issue, see the previous section. An experimental procedure combining production and perception tasks will offer a comprehensive view since ‘an operable function must, by definition, be contrastively encoded through production, and reliably decoded through perception’ (Xu, 2011, p. 90).

2.6.1

Testable hypotheses

Experimentation is based on a phonological or phonetic model to be tested. A model comes from a thorough observation and meaningful analysis of the phenomenon under scrutiny (Barbosa, 29

Ivana Didirková and Anne Catherine Simon

2012, p. 1), and experimental control ‘is about how to guarantee systematic identification and separation of the factors that may significantly contribute to what is under observation’ (Xu, 2011, p. 88). Hypotheses are linked to the functions of phonetic and phonological units. For illustration purposes, at the phonemic level, the example comes from Thai, a language where vowel length has a contrastive value (Abramson & Reo, 1990). An experimental procedure is needed to elucidate whether the underlying mechanism controls the relative duration of the articulations for the phonemes in contrast or whether short and long counterparts have somewhat different vowel qualities (timbre). The independent variable is the vowel duration (artificially lengthened or shortened without modifying vowel quality), and the dependent variable is the phoneme identification made by naïve listeners. At the prosodic level, a second example questions the relative effect of silent pauses and syntactic structure on the perception of boundaries in natural speech (Simon & Christodoulides, 2016). In the perceptual part of the experiment, naïve listeners are presented with samples of normal versus delexicalised speech and instructed to press a key whenever they perceive the end of “a group of words”. The controlled, independent variable is the syntactic structure, and the dependent variable is the boundary score assigned to each final-word syllable. Other acoustic variables are the pause duration or the final-word syllable relative length. Statistical analyses allow for studying the relative importance of acoustic and syntactic cues on the perception of a boundary. In any case, a clear delimitation of independent, dependent, and to-be-controlled variables is needed in the study design in a form that permits the application of statistical analysis (Xu, 2011).

2.6.2

Speakers’ selection

Careful selection of speakers recorded for stimuli preparation is crucial to avoid bias in the experiment. Selection criteria depend on the study’s objectives: native or non-native speakers, with standard or regional accents, naïve or professional, etc. It is difficult to control which aspect will be perceived as related to the speaker’s gender, dialectal origin, or age; and to control which verbal or phonetic parameters will be decisive in the identification task (Simon et al., 2012).

2.6.3

Recording equipment

A digital recorder and a unidirectional microphone are usually used, and the choice of equipment can influence the acoustic analysis (Hansen & Pharao, 2006). Making these recordings in a soundproof studio or in a quiet environment without surrounding noise is essential. The distance between the speaker and the microphone will influence certain acoustic parameters, such as intensity. With the pandemic, face-to-face recordings became impossible and remote recordings were used interactively or via self-recordings of subjects who were sent instructions beforehand. Magistro (2021) tested four devices: a professional recorder and headset microphone, tablet, laptop, and Android phone. In conclusion, it is shown that Zencastr software installed on a laptop is the most optimal software when professional equipment cannot be used, although the other devices can be used for the study of intonation.

2.6.4

Recruiting the participants

In a perception experiment, it is important to check several characteristics of the participants that could bias the task results. The participants must not have a hearing impairment. Those who suffer from hearing impairment should usually be excluded from the results (Thomas, 2002, p. 134). The 30

Experimental phonetics and phonology

level of expertise (lay speakers or experts) or the linguistic background (L1 or L2 speakers) should also be checked (Shen & Watt, 2015). For some simple annotation tasks, recruiting non-native participants may be advantageous (Hasegawa-Johnson et al., 2015). In speech prosody studies, one often asks about the musical skills of the participant (Cason et al., 2015) since it may correlate with their ability to perceive fundamental frequency, accentuation or rhythm. Instructing participants to use quality-listening equipment (headphones) and pass the experiment in a quiet room without interrupting the trial is also important. As for the number of participants to be recruited, it depends on the statistical analyses to be carried out.

2.6.5

Presentation of stimuli

The number and duration of the stimuli should be controlled. They should be presented randomly to control for habituation or learning effects. Often, samples include a carrier sentence (i.e., a sentence where the stimulus of interest is embodied, such as ‘Say baba again. Say papa again. Say tata again.’). When an identification task (speaker’s origins, genre, speaking style) is required, it has been shown that the length of the samples has little influence on the results (Vieru et al., 2011) and that excerpts of 6–20 seconds are sufficient in most cases to allow listeners to make their judgement. A session should not last longer than 10 or 15 minutes. Otherwise, the task may be given up, especially if it is done remotely or without compensation.

2.6.6

Statistical analysis

Statistical tests to be used on the dataset should be determined simultaneously with the experimental protocol. It should be clear which variables will be studied, what will be considered dependent and independent variables, and what their nature is (qualitative, quantitative, ordinal). If interactions between variables are studied, this should be defined beforehand. The number of participants and the number of trials are essential since they also determine the use or not of a parametric test. Finally, it should be known if the same population will be asked to repeat a specific activity in a given lapse of time (paired samples, e.g., in longitudinal studies) or if one single participation is required (independent samples, typically group comparisons).

2.7

Future perspectives

In the experimental approach, phonetics and phonology are not to be conceived of as two separate disciplines (Ohala, 1990). Whether in production or perception, the analysis of observed acoustic or articulatory phenomena always depends on the consideration of their realisation within a domain (syllable, word, utterance, turn of speech) and of their function in each linguistic system (Cangemi & Baumann, 2020). Experimental phonetics and phonology have developed a wide range of scientific methods that allow testing theories about speech production and perception. Interdisciplinarity remains crucial since the theories to be tested derive from psychology of language, cognitive linguistics, articulatory phonology, sociolinguistics, discourse analysis, etc. Interdisciplinarity is also fruitful at the methodological level, in the combination of experimental linguistics and corpus linguistics (Gilquin & Gries, 2009), or experimental linguistics and computer modelling (Xu, 2011, p. 95). Statistical analysis is an indispensable step in any experimental research. New methods of speech data analysis are emerging and are discussed: 31

Ivana Didirková and Anne Catherine Simon

Advances in computational power have made new analytical approaches possible, and the use of open access software such as R […] increases the speed with which new statistical methods are shared both within our field and across disciplines. As accessibility to these methods increases, more and more people within linguistics employ increasingly complex analytical techniques. (Roettger et al., 2019, p. 1) These developments require that the scripts used to perform the analyses be made available and open to critical discussion. Finally, the movement towards open science forces to make primary data (sound or audiovisual recordings) and secondary data (acoustic measurements, transcripts, annotations, etc.) accessible, the issue being the cumulative and reproducible nature of the research results. ‘Data sharing is a crucial and necessary part of responsible conduct in research’ (Garellek et al., 2020, p. 3). This time-consuming requirement is difficult to guarantee as the laws protecting personal data have recently been strengthened and academic institutions sometimes apply them in a very restrictive way. Nevertheless, what phonetic experimenters need to be prepared for is to make their data available to the scientific community and usable by other researchers. This activity of curating and archiving data, already in use for a long time in corpus phonology (Durand et al., 2014), is becoming an integral part of the experimental phonetics and phonology publication activity itself (as evidenced by the new editorial policy of the Journal of Phonetics, Cho, 2021).

Further readings Cohn, A. C., Fougeron, C., & Huffman, M. K. (2011). The Oxford Handbook of Laboratory Phonology. Oxford University Press. Durand, J., Gut, U., & Kristoffersen, G. (2014). The Oxford Handbook of Corpus Phonology. Oxford University Press. Katz, W. F., & Assmann, P. F. (2019). The Routledge Handbook of Phonetics. Routledge. Knight, R. A., & Setter, J. (2022). The Cambridge Handbook of Phonetics. Cambridge University Press.

Related topics Experimental sociolinguistics; experimental research in cross-linguistic psycholinguistics; analysing speech perception; new directions in statistical analysis for experimental linguistics

References Abramson, A. S., & Reo, N. (1990). Distinctive vowel length: Duration vs. spectrum in Thai, Journal of Pho­ 79–92. ­ ​­ https://doi.org/10.1016/S0095–4470(19)30395-X. ­ ­ ­­ ​­ ­ ­­ ​­ netics, 18(2), Alday, P. M. (2019). M/EEG analysis of naturalistic stories: A review from speech to language processing, Lan­ 457–473. ­ ​­ ­ ­ ­ guage, Cognition and Neuroscience, 34(4), https://doi.org/10.1080/23273798.2018.1546882. Anansiripinyo, T., & Onsuwan, C. (2019). Acoustic-Phonetic Characteristics of Thai filled Pauses in Mono­ ​­ https://­ logues, in The 9th Workshop on Disfluency in Spontaneous Speech (DiSS 2019). Budapest, 51–54. ­ ­­ ­​­­ ­​­­ ­​­­ ​­ doi.org/10.21862/diss-09–014-anan-onsu. Avelino, H., Coler, M., & Wetzels, L. (Eds.). (2016). The Phonetics and Phonology of Laryngeal Features in ­ ­ ­ ­ Native American Languages. Brill. Available at: https://brill.com/view/title/32268 Awan, S. N. (2008). Instrumental Analysis of Phonation. In M. J. Ball, M. R. Perkins, N. Mller, & S. Howard ­ (Eds.). The Handbook of Clinical Linguistics (pp. ­­  ­344–359). ​­ Blackwell Publishing Ltd. https://doi.org/ ­ ­ 10.1002/9781444301007.ch21. ­

32

Experimental phonetics and phonology Bänziger, T., & Scherer, K. R. (2005). The role of intonation in emotional expressions, Speech Communication, 46(3–4), https://doi.org/10.1016/j.specom.2005.02.016. ­­ ​­ ­252–267. ​­ ­ ­ ­ Boë, L. -J., & Vilain, C. (Eds.). (2011). Un siècle de phonétique expérimentale : Fondation et éléments de développement, Hommage à Théodore Rosset et John Ohala. ENS Editions. Lyon (Langages). Available at: http://catalogue-editions.ens-lyon.fr/fr/livre/?GCOI=29021100097800 (Accessed: 11 October 2021). ­­ ​­ ­ ​­ ­ ­ ­ Boersma, P., & Weenink, D. (2022). Praat: Doing Phonetics by Computer (Version 6.2.12). Available at: http://www.praat.org/ ­ Bonilha, H. S., White, L., Kuckhahn, K., Gerlach, T. T., & Deliyski, D. D. (2012). Vocal fold mucus aggregation in persons with voice disorders. Journal of Communication Disorders, 45(4), https://doi. ­ 304–311. ­ ​­ ­ org/10.1016/j.jcomdis.2012.03.001. ­ ­ Boula de Mareüil, Ph., & Vieru-Dimulescu. (2006). The contribution of prosody to the perception of foreign accent. Phonetica, 63, ­247–267. ​­ Brentari, D. (2019). Sign Language Phonology (1st ed.). Cambridge University Press. https://doi.org/10.1017/ 9781316286401. Brognaux, S., & Drugman, T. (2014). Phonetic Variations: Impact of the Communicative Situation, in Proceedings of the 7th International Conference on Speech Prosody 2014. Speech Prosody 7, 428–432. ­ ​­ https://doi.org/10.21437/SpeechProsody.2014–72. ­ ­ ­ ­ ​­ Brunelle, M. & Kirby, J. (2016). Tone and phonation in Southeast Asian Languages, Language and Linguistics Compass, 10(4), ­ ­191–207. ​­ Caccia, M., & Lorusso, M. L. (2019). When prosody meets syntax: The processing of the syntax-prosody interface in children with developmental dyslexia and developmental language disorder, Lingua, 224, ­16–33. ​­ https://doi.org/10.1016/j.lingua.2019.03.008. ­ ­ ­ Campbell, L., & Grondona, V. (2012). The Indigenous Languages of South America: A Comprehensive Guide, The Indigenous Languages of South America. De Gruyter Mouton. https://doi.org/10.1515/9783110258035. Cangemi, F., & Baumann, S. (2020). Integrating phonetics and phonology in the study of linguistic prominence, Journal of Phonetics, 81, 100993. https://doi.org/10.1016/j.wocn.2020.100993. ­ ­ ­ Cason, N., Astésano, C., & Schön, D. (2015). Bridging music and speech rhythm: Rhythmic priming and audio–motor training affect speech perception, Acta Psychologica, 155, 43–50. https://doi.org/10.1016/j. ­ ​­ ­ ­ ­ actpsy.2014.12.002. Cebrian, J. (2021). Perception of English and Catalan vowels by English and Catalan listeners: A study of reciprocal ­cross-linguistic similarity, The Journal of the Acoustical Society of America, 149(4), ​­ ­ 2671–2685. ­ ​­ https://doi.org/10.1121/10.0004257. ­ ­ ­ Cho, T. (2021). Where we are at: Impact, special collections, open science and registered report at the Journal of Phonetics, Journal of Phonetics, 89. -https://doi.org/10.1016/j.wocn.2021.101113. ​­ ­ ­ ­ Cole, J., Mahrt, T., & Roy, J. (2017). Crowd-sourcing prosodic annotation, Computer Speech & Language, 45, ­300–325. https://doi.org/10.1016/j.csl.2017.02.008. ​­ ­ ­ ­ Dehais Underdown, A., Buchman, L., & Demolin, D. (2019). Acoustico-Physiological coordination in the Human Beatbox: A pilot study on the beatboxed Classic Kick Drum, in 19th International Congress of Phonetic Sciences. Melbourne, Australia. Available at: https://hal.archives-ouvertes.fr/hal-02284132 ­ ­ ​­ ­­ ​­ Demolin, D., Giovanni, A., Hassid, S., Heim, C., Lecuit, V., & Soquet, A. (1997). Direct and Indirect Measurements of Subglottic Pressure, in LARYNX 1997. Marseille, ­69–72. Available at: https://www.isca​­ ­ ­ ​ ­speech.org/archive_open/larynx_97/lar7_069.html ­ ­ ­ Demolin, D. (2012). Experimental Methods in Phonology, TIPA. Travaux interdisciplinaires sur la parole et le langage [Preprint], (28). Available at: http://tipa.revues.org/162 Demolin, D., & Chabiron, C. (2013). Clicks, Stop Bursts, Vocoids and the Timing of Articulatory Gestures in Rwanda, in Phonetics and Phonology of Sub-Saharan Languages. Johannesburg, South Africa. Available at: https://hal.archives-ouvertes.fr/hal-00834241 ­ ­ ​­ ­­ ​­ Didirková, I., Crible, L., & Simon, A. C. (2019). Impact of prosody on the perception and interpretation of discourse Relations: Studies on “Et” and “Alors” in spoken French, Discourse Processes, 56(8), ­ ­619–642. ​­ https://doi.org/10.1080/0163853X.2018.1528963. ­ ­ ­ Didirková, I., Le Maguer, S., & Hirsch, F. (2021). An articulatory study of differences and similarities between stuttered disfluencies and non-pathological disfluencies, Clinical Linguistics & Phonetics, 35(3), ­ https://doi.org/10.1080/02699206.2020.1752803. ­201–221. ​­ ­ ­ ­ Dogil, G., Ackermann, H., Grodd, W., Haider, H., Kamp, H., Mayer, J., Riecker, A., & Wildgruber, D. (2002). The speaking brain: A tutorial introduction to fMRI experiments in the production of speech, prosody and syntax, Journal of Neurolinguistics, 15(1), https://doi.org/10.1016/S0911-6044(00)00021-X. ­ ­59–90. ​­ ­ ­ ­­ ​­ ­ ­­ ​­

33

Ivana Didirková and Anne Catherine Simon Dufour, S., Mirault, J., & Grainger, J. (2022). Investigating the locus of transposed-phoneme effects using cross-modal priming, Acta Psychologica, 226, 103578. https://doi.org/10.1016/j.actpsy.2022.103578. ­ ​­ ­ ­ ­ Durand, J., Laks, B., & Lyche, C. (2014). French phonology from a corpus perspective: The PFC programme. In J. Durand, U. Gut & G. Kristoffersen (Eds.). The Oxford Handbook of Corpus Phonology (pp. ­­  ­86–497). ​­ Oxford University Press, https://doi.org/10.1093/oxfordhb/9780199571932.013.015. ­ ­ ­ ­ Hansen, G. F., & Pharao, N. (2006). Microphones and Measurements. In G. Ambrazaitis & S. Schötz (Eds.). Working Papers. Proceedings from Fonetik 2006 (pp. 49–52). Lund University, Centre for Languages and Literature. Available at: https://www.academia.edu/864240/Microphones_and_measurements ­ ­ ­ Garellek, M., Gordon, M., Kirby, J., Lee, W. -S., Michaud, A., Mooshammer, C., Niebuhr, O., Recasens, D., Roettger, T. B., Simpson, A., & Yu, K. M. (2020). Toward open data policies in phonetics: What we can gain and how we can avoid pitfalls, Journal of Speech Science, 9(1), ­ 3. Georgiafentis, M., & Sfakianaki, A. (2004). Syntax interacts with prosody: The VOS order in Greek, Lingua, 114(7), https://doi.org/10.1016/S0024–3841(03)00099-8. ­ ­935–961. ​­ ­ ­ ­­ ​­ ­ ­­ ​­ Gilquin, G.,  & Gries, S. T. (2009). Corpora and experimental methods: A ­state-of-the-art ­ ­​­­ ­​­­ ​­ review, Corpus Linguistics and Linguistic Theory, 5(1), ­ ­1–26. ​­ Goldman, J. P., Pršir, T., Christodoulides, G., Simon, A. C., & Auchlin, A. (2014). Phonogenre identification: A perceptual experiment with 8 delexicalised speaking styles, Nouveaux Cahiers de Linguistique Francaise, 31, ­51–62. ​­ Gonen, E., Livnat, Z., & Amir, N. (2015). The discourse marker Axshav (“now”) in spontaneous spoken Hebrew: Discursive and prosodic features, Journal of Pragmatics, 89, ­69–84. https://doi.org/10.1016/j. ​­ ­ ­ ­ pragma.2015.09.005. Gras, P., & Elvira-García, W. (2021). The role of intonation in construction grammar: On prosodic constructions, Journal of Pragmatics, 180, ­232–247. https://doi.org/10.1016/j.pragma.2021.05.010. ​­ ­ ­ ­ Grigos, M. I. (2009). Changes in articulator movement variability during phonemic development: A longitudinal study, Journal of Speech, Language, and Hearing Research: JSLHR, 52(1), https://doi. ­ ­164–177. ​­ ­ org/10.1044/1092–4388(2008/07–0220). ­ ­­ ​­ ­ ­­ ​­ Grosjean, F. (1980). Spoken word recognition processes and the gating paradigm, Perception & Psychophysics, 28(4), https://doi.org/10.3758/bf03204386. ­ ­267–283. ​­ ­ ­ ­ Gross, R. D., Steinhauer, K. M., Zajac, D. J., & Weissler, M. C. (2006). Direct measurement of subglottic air pressure while swallowing, The Laryngoscope, 116(5), https://doi.org/10.1097/01. ­ ­753–761. ​­ ­ ­ ­ mlg.0000205168.39446.12. Guenther, F. H. (2016) Neural Control of Speech. MIT Press. Hasegawa-Johnson, M., Cole, J., Jyothi, P., & Varshney, L. R. (2015). Models of dataset size, question design, and cross-language speech perception for speech crowdsourcing applications, Laboratory Phonology, 6(3–4), https://doi.org/10.1515/lp-2015-0012. ­­ ​­ ­381–431. ​­ ­ ­ ­­ ­​­­ ​­ Hayward, K. (2000). Experimental Phonetics: An Introduction. Routledge. Hsiao, Y. E. (2020). The syntax-prosody competition: Evidence from adjunct prosodic parsing in iGeneration Taiwanese, Lingua, 237, 102805. https://doi.org/10.1016/j.lingua.2020.102805. ­ ­ ­ Huang, Y. T., Newman, R. S., Catalano, A., & Goupell, M. J. (2017). Using prosody to infer discourse prominence in ­cochlear-implant users and ­normal-hearing listeners, Cognition, 166, ­184–200. https://doi.org/ ​­ ​­ ​­ ­ ­ 10.1016/j.cognition.2017.05.029. ­ Jacewicz, E., Fox, R. A., & Wei, L. (2010). Between-speaker and within-speaker variation in speech tempo of American English, The Journal of the Acoustical Society of America, 128(2), https://doi. ­ ­839–850. ​­ ­ org/10.1121/1.3459842. ­ ­ Jiang, J. J., Hanna, R. B., Willey, M. V., & Rieves, A. (2016). The measurement of airflow using singing helmet that allows free movement of the jaw, Journal of Voice, 30(6), https://doi.org/10.1016/j. ­ ­641–648. ​­ ­ ­ ­ jvoice.2015.07.018. Kim, S. -E., & Tilsen, S. (2022). An investigation of functional relations between speech rate and phonetic variables, Journal of Phonetics, 93, 101152. https://doi.org/10.1016/j.wocn.2022.101152. ­ ­ ­ King, H., & Ferragne, E. (2020). Loose lips and tongue tips: The central role of the /r/-typical labial gesture in Journal of Phonetics, 80, 100978. https://doi.org/10.1016/j.wocn.2020.100978. ­Anglo-English, ​­ ­ ­ ­ Labov, W. (1972). Sociolinguistic Patterns. Philadelphia: University of Pennsylvania Press. ­ Ladefoged, P. (2004). Phonetics and phonology in the last 50 years, in From Sound to Sense. MIT. Lam Tang, J. A., Boliek, C. A., & Rieger, J. M. (2008). Laryngeal and respiratory behavior during pitch change in professional singers, Journal of Voice: Official Journal of the Voice Foundation, 22(6), https:/ ­ ­622–633. ​­ /doi.org/10.1016/j.jvoice.2007.04.002. ­ ­ ­

34

Experimental phonetics and phonology Lawson, E., & Stuart-Smith, J. (2021). Lenition and fortition of /r/ in utterance-final position, an ultrasound tongue imaging study of lingual gesture timing in spontaneous speech, Journal of Phonetics, 86, ­p.  101053. https://doi.org/10.1016/j.wocn.2021.101053. ­ ­ ­ Lehner, K., & Ziegler, W. (2022). Indicators of communication limitation in dysarthria and their relation to auditory-perceptual speech symptoms: Construct validity of the KommPaS web app, Journal of Speech, Language, and Hearing Research, 65(1), ­ 22–42. ­ ​­ https://doi.org/10.1044/2021_JSLHR-21–00215. ­ ­ ­­ ­​­­ ​­ Levelt, W. J. M. (1989). Speaking: From Intention to Articulation. MIT Press. Lieberman, P. (1968). Direct comparison of Subglottal and Esophageal pressure during speech, The Journal ­ ­1157–1164. ​­ ­ ­ ­ of the Acoustical Society of America, 43(5), https://doi.org/10.1121/1.1910950. van Lieshout, P., Merrick, G., & Goldstein, L. (2008). An articulatory phonology perspective on rhotic articu­ lation problems: A descriptive case study, Asia Pacific Journal of Speech, Language and Hearing, 11(4), ­283–303. ​­ ­ ­ ­ https://doi.org/10.1179/136132808805335572. Ma, B., & Zhuang, H. (2022). Prosody and sentencehood in Chinese: A case study with reference to Chinese­English comparison, Lingua, 270, 103217. https://doi.org/10.1016/j.lingua.2021.103217. ­ ­ ­ Magistro, G. (2021). Speech prosody and remote experiments: A technical report, arXiv:2106.10915 [cs, eess] [Preprint]. Available at: http://arxiv.org/abs/2106.10915 ­ ­ ­ Mehta, D. D., Kobler, J. B., Zeitels, S. M., Zañartu, M., Ibarra, E. J., Alzamendi, G. A., Manriquez, R., Erath, B. D., Peterson, S. D., Petrillo, R. H., & Hillman, R. E. (2021). Direct measurement and modeling of intraglottal, subglottal, and vocal fold collision pressures during phonation in an individual with a hemilaryngectomy, Applied Sciences, 11(16), 7256. https://doi.org/10.3390/app11167256. ­ ­ ­ ­ Merlo, S., & Mansur, L. L. (2004). Descriptive discourse: Topic familiarity and disfluencies, Journal of Communication Disorders, 37(6), https://doi.org/10.1016/j.jcomdis.2004.03.002. ­ ­489–503. ​­ ­ ­ ­ Michaud, A. (2012). The complex tones of East/Southeast Asian languages: Current challenges for typology and modelling, in Third International Symposium on Tonal Aspects of Languages (TAL 2012), Nanjing, China (2012), ­1–7. ​­ Michelas, A., & D’Imperio, M. (2012). When syntax meets prosody: Tonal and duration variability in French Accentual Phrases, Journal of Phonetics, 40(6), https://doi.org/10.1016/j.wocn.2012.08. ­ ­816–829. ​­ ­ ­ ­ 004. Moisik, S. R., Lin, H., & Esling, J. H. (2014). A study of laryngeal gestures in Mandarin citation tones using simultaneous laryngoscopy and laryngeal ultrasound (SLLUS), Journal of the International Phonetic Association, 44(1), https://doi.org/10.1017/S0025100313000327. ­ ­21–58. ​­ ­ ­ ­ Moniz, H., Batista, F., Mata, A. I., & Trancoso, I. (2014). Speaking style effects in the production of disfluencies, Speech Communication, 65, ­20–35. https://doi.org/10.1016/j.specom.2014.05.004. ​­ ­ ­ ­ Moradi, S., Lidestam, B., Saremi, A., & Rönnberg, J. (2014). Gated auditory speech perception: Effects of listening conditions and cognitive capacity, Frontiers in Psychology, 5. https://doi.org/10.3389/fpsyg.2014. ­ ­ ­ 00531. Nicolaidis, K., & Baltazani, M. (2014). The Greek Rhotic in /rC/ sequences: An acoustic and electropalatographic study. In N. Lavidas, T. Alexiou & A. M. Sougari (Eds.). Major Trends in Theoretical and Applied Linguistics 1. Selected Papers from the 20th ISTAL (pp. 157–176). De Gruyter Open Poland. https://doi. org/10.2478/9788376560762.p22. ­ ­ Ohala, J. J. (1987). Experimental phonology, in Proceedings of the Annual Meeting, Berkeley Linguistic Society. Berkeley, California, ­207–222. ​­ Ohala, J. J. (1990). There is no interface between phonology and phonetics: A personal view, Journal of Phonetics, 18(2), https://doi.org/10.1016/S0095–4470(19)30399-7. ­ ­153–171. ​­ ­ ­ ­­ ​­ ­ ­­ ​­ Ohala, J. J. (2010) The Relation between Phonetics and Phonology. In W. Hardcastle, J. Laver & F. E. Gibbon (Eds.). The Handbook of Phonetic Sciences (2nd ed.), (pp. 653–677). John Wiley & Sons, Ltd. ­ Ohala, J. J., & Jaeger, J. J. (1986). Experimental Phonology. Academic Press. Overby, M., & Caspari, S. S. (2015). Volubility, consonant, and syllable characteristics in infants and toddlers later diagnosed with childhood apraxia of speech: A pilot study, Journal of Communication Disorders, 55, https://doi.org/10.1016/j.jcomdis.2015.04.001. ­44–62. ​­ ­ ­ ­ Perlman, M., & Benitez, N. J. (2008). Talking fast: The use of speech rate as iconic gesture, in 9th Conference on Conceptual Structure, Discourse, & Language (CSDL9), Social Science Research Network. Available at: https://ssrn.com/abstract=1299284 ­ ­ Plant, R. L., & Hillel, A. D. (1998). Direct measurement of subglottic pressure and laryngeal resistance in normal subjects and in spasmodic dysphonia, Journal of Voice: Official Journal of the Voice Foundation, 12(3), https://doi.org/10.1016/s0892–1997(98)80020-9. ­ ­300–314. ​­ ­ ­ ­­ ​­ ­ ­­ ​­

35

Ivana Didirková and Anne Catherine Simon Pouplier, M., & Beňuš, Š. (2009). Phonetic properties of syllabic consonants in Slovak, The Journal of the Acoustical Society of America, 125(4), https://doi.org/10.1121/1.4783740. ­ ­2569–2569. ​­ ­ ­ ­ Prieto, P., Puglesi, C., Borràs-Comes, J., Arroyo, E., & Blat, J. (2015). Exploring the contribution of prosody and gesture to the perception of focus using an animated agent, Journal of Phonetics, 49, ­41–54. https://­ ​­ doi.org/10.1016/j.wocn.2014.10.005. ­ ­ Przewozny, A., Viollain, C., & Navarro, S. (Eds.). (2020). The Corpus Phonology of English: Multifocal Analyses of Variation. Edinburgh: Edinburgh University Press. Available at: https://www.jstor.org/stable/ 10.3366/j.ctv177th25. ­ Rathcke, T., Lin, C.-Y., Falk, S., & Bella, S. D. (2021). Tapping into linguistic rhythm, Laboratory Phonology: Journal of the Association for Laboratory Phonology, 12(1), ­ 11. https://doi.org/10.5334/labphon.248. ­ ­ ­ Redcay, E., Haist, F., & Courchesne, E. (2008). Functional neuroimaging of speech perception during a pivotal period in language acquisition, Developmental Science, 11(2), https://doi.org/10.1111/ ­ ­237–252. ​­ ­ ­ ­j.1467–7687.2008.00674.x. ­ ​­ Roettger, T. B., Winter, B., & Baayen, H. (2019). Emergent data analysis in phonetic sciences: Towards pluralism and reproducibility, Journal of Phonetics, 73, ­1–7. ​­ https://doi.org/10.1016/j.wocn.2018.12.001. ­ ­ ­ Rothenberg, M. (1992). A multichannel electroglottograph, Journal of Voice, 6(1), https://doi.org/ ­ ­36–43. ​­ ­ ­ 10.1016/S0892–1997(05)80007-4. ­­ ​­ ­ ­­ ​­ Rousselot, P. -J. Principes de phonétique expérimentale. Paris: Didier. ​­ (1924). ­ de Ruiter, J. P., Bangerter, A., & Dings, P. (2012). The interplay between gesture and speech in the production of referring expressions: Investigating the tradeoff hypothesis, Topics in Cognitive Science, 4(2), ­ ­232–248. ​­ https://doi.org/10.1111/j.1756–8765.2012.01183.x. ­ ­ ­ ­ ​­ Salomoni, S., van den Hoorn, W., & Hodges, P. (2016). Breathing and singing: Objective characterization of breathing patterns in classical singers, PLoS ONE, 11(5), ­ e0155084. https://doi.org/10.1371/journal. ­ ­ ­ pone.0155084. Scripture, E. W. (1902). The Elements of Experimental Phonetics. Cambridge University Press. Sedhu, A. (2015). A comparative study of the rhythm of Fulfulde and Hausa, Journal of the Linguistic Association of Nigeria, 20, ­200–215. ​­ Shattuck-Hufnagel, S. (2019). Toward an (even) more comprehensive model of speech production planning, Language, Cognition and Neuroscience, 34(9), doi:10.1080/23273798.2019.1650944. ­ ­1202–1213. ​­ ­ Shen, C., & Watt, D. (2015). Accent categorisation by lay listeners: Which type of “Native Ear” works better? York Papers in Linguistics (YPL2), 14. Available at: https://www.academia.edu/15470009/Accent_ ­ ­ ­ Categorisation_by_Lay_Listeners_Which_Type_of_Native_Ear_Works_Better Signorello, R., Hassid, S., & Demolin, D. (2017). Aerodynamic features of French fricatives, in Proceedings of Interspeech 2017. Interspeech 2017, Stockholm, Sweden, 2267–2271. Available at: https://www.academia.edu/74129440/Aerodynamic_Features_of_French_Fricatives ­ ­ Simon, A. -C., de la prosodie. In C. Cavé, ​­   & Auchlin, A. (2001). ­ ­Multi-modal, ​­ ­multi-focal ​­   : les “hors-phase” ­­ ​­ I. Guaïtella, & S. Santi (Eds.). Oralité et gestualité. Interaction et comportements multimodaux dans la communication (pp. L. Harmattan. ­­  ­629–633). ​­ Simon, A. C., & Christodoulides, G. (2016). Frontières prosodiques perçues : corrélats acoustiques et indices syntaxiques, Langue française, 191(3), ­ ­83–106. ​­ Simon, A. C., Hambye, P., Bardiaux, A., & Boula de Mareüil, P. (2012). Caractéristiques des accents régionaux en français : que nous apprennent les approches perceptives ? In A. C. Simon (Ed.). La variation ­­  ­27–40). ​­ ­ ­­ ­​­­ prosodique régionale en français (pp. De Boeck Supérieur. Available at: https://www.cairn.info/lavariation-prosodique-regionale-en-francais--9782801116951-page-27.htm ­​­­ ­​­­ ­​­­ ­​­­ ​­ ­​­­ ­​­­ ​­ (Accessed: ­ 27 March 2020). Smitheran, J. R., & Hixon, T. J. (1981). A clinical method for estimating laryngeal airway resistance during vowel production, The Journal of Speech and Hearing Disorders, 46(2), ­ 138–146. ­ ​­ https://doi. ­ org/10.1044/jshd.4602.138. ­ ­ Steinhauer, K., Abada, S. H., Pauker, E., Itzhak, I., & Baum, S. R. (2010). Prosody–syntax interactions in aging: ­Event-related ​­ potentials reveal dissociations between ­on-line ​­ and ­off-line ​­ measures. Neuroscience Letters, 472(2), ­ ­133–138. ​­ https://doi.org/10.1016/j.neulet.2010.01.072. ­ ­ ­ Stouten, F., Duchateau, J., Martens, J. P., & Wambacq, P. (2006). Coping with disfluencies in spontaneous speech recognition: Acoustic detection and linguistic context manipulation, Speech Communication, 48(11), ­ ­1590–1606. ​­ https://doi.org/10.1016/j.specom.2006.04.004. ­ ­ ­ Thomas, E. R. (2002). Sociophonetic applications of speech perception experiments, American Speech, 77(2), ­ ­115–147. ​­

36

Experimental phonetics and phonology Tomaschek, F., Arnold, D., Sering, K., Tucker, B. V., van Rij, J., & Ramscar, M. (2021). Articulatory variability is reduced by repetition and predictability, Language and Speech, 64(3), https://doi.org/ ­ 654–680. ­ ​­ ­ ­ 10.1177/0023830920948552. ­ Tyler, J. (2013). Prosodic correlates of discourse boundaries and hierarchy in discourse production, Lingua, 133, 101–126. https://doi.org/10.1016/j.lingua.2013.04.005. ­ ​­ ­ ­ ­ Vainio, M., Suni, A. S., Raitio, T., Nurminen, J., Järvikivi, J., & Alku, P. (2009). New Method for Delexicalization and its Application to Prosodic Tagging for Text-to-Speech Synthesis. https://doi.org/10.21437/ ­ ­ Interspeech.2009-514. ­ ­ ​­ Van Praet, W., & O’Grady, G. (2018). The prosody of specification: Discourse intonational cues to setting up a variable, Journal of Pragmatics, 135, 87–100. https://doi.org/10.1016/j.pragma.2018.07.013. ­ ​­ ­ ­ ­ Vieru, B., de Mareüil, P. B., & Adda-Decker, M. (2011). Characterisation and identification of non-native French accents, Speech Communication, 53(3), https://doi.org/10.1016/j.specom.2010.10.002. ­ 292–310. ­ ​­ ­ ­ ­ Wagner, P., Trouvain, J., & Zimmerer, F. (2015). In defense of stylistic diversity in speech research, Journal of Phonetics, 48, 1–12. ­ ​­ https://doi.org/10.1016/j.wocn.2014.11.001. ­ ­ ­ Wang, J., Zhu, Y., Chen, Y., Mamat, A., Yu, M., Zhang, J., & Dang, J. (2020). An eye-tracking study on audiovisual speech perception strategies adopted by normal-hearing and deaf adults under different language familiarities. Journal of Speech, Language, and Hearing Research, 63(7), https://doi. ­ 2245–2254. ­ ​­ ­ org/10.1044/2020_JSLHR-19-00223. ­ ­­ ­​­­ ​­ Warner, H. J., Whalen, D. H., Harel, D., & Jackson, E. S. (2022). The effect of gap duration on the perception of fluent versus disfluent speech. Journal of Fluency Disorders, 71, 105896. https://doi.org/10.1016/j. ­ ­ ­ jfludis.2022.105896. Wells, B., & Whiteside, S. (2008). Prosodic impairments. In M. J. Ball, M. R. Perkins, N. Mller, & S. Howard (Eds.). The Handbook of Clinical Linguistics (pp. ­­  ­549–567). ​­ Blackwell Publishing Ltd. https://doi.org/ ­ ­ ­ 10.1002/9781444301007.ch34. ­ Wu, Y., Didirkova, I., & Simon, A.-C. (2022). Disfluences en parole continue en français : paramètres prosodiques des pauses pleines et des allongements vocaliques, in Actes des 34e Journées d’Études sur la Parole. 34e Journées d’Études sur la Parole - JEP2022. Available at: https://dial.uclouvain.be/pr/boreal/ ­ ­ ­ ­ object/boreal:260873 ­ Xu, Y. (2011). Speech prosody: A methodological review, Journal of Speech Sciences, 1(1), ­ 85–115. ­ ​­ Zerbian, S., & Barnard, E. (2008). Phonetics of intonation in South African Bantu languages, Southern African Linguistics and Applied Language Studies, 26(2), ­ ­235–254. ​­

37

3 EXPERIMENTAL MORPHOLOGY Vera Heyer

3.1

Introduction and definitions

Each language consists of tens of thousands of words; however, not all words are created equal in that there are many words that are morphologically related (i.e., they share structures). For instance, the inflected past tense form laughed contains its verbal stem laugh, the noun happiness was derived from the adjective happy by adding the suffix-ness and the compound teacup is a ​­ combination of the nouns tea and cup. Knowledge of morphological processes such as inflection, derivation and compounding allows us to adapt novel words to sentence contexts (e.g., using the plural selfies when taking more than one picture of oneself) and understand morphologically complex words we have never encountered before (e.g., the compound corona babies for babies conceived and/or born during the COVID-19 pandemic). This ability to derive meaning from the individual meaningful parts (or morphemes) of morphologically complex words raises the question if the organisation of the mental lexicon (i.e., the mental storage of all words known to an individual) makes strategic use of morphological structure. More specifically, storing stems (e.g., laugh, happy, tea, cup) and affixes (e.g., -ed, -ness) ​­ ​­ separately and then combining them to morphologically complex forms when needed (e.g., the past tense forms laughed and cried with the suffix -ed, ​­ the derived nouns happiness and sadness with the suffix -ness and different kinds of containers for tea, such as teacup and teapot) would ​­ reduce the required storage space significantly compared to storing all possible combinations. However, in natural speech, words are produced and comprehended within milliseconds, with this process potentially being slowed down by the combinatory processes. The concept of economy with respect to storage space versus timing is at the core of three opposing branches of models of morphological processing. First, according to full-listing models (e.g., Butterworth, 1983; Bybee, 1988, 1995; Plaut & Gonnerman, 2000), morphologically complex words are stored in the mental lexicon as whole forms and, therefore, processed in the same way as morphologically simplex words. Second, the decomposition approach assumes that all morphologically complex forms are decomposed into their constituent parts, with the meaning derived from these parts (e.g., Taft, 2004; Taft & Forster, 1975; Taft & Nguyen-Hoan, 2010). Third, dual-route models (e.g., Pinker, 1999; Schreuder & Baayen, 1995; for a review, see Clahsen, 2006) evolved as a hybrid of the two single-route approaches above to account for differences based on DOI: 10.4324/9781003392972-5 38

Experimental morphology

factors such as regularity and transparency, positing both a direct access route and a decomposition route, with one or the other proving to be more successful depending on the encountered form. Experimental morphologists apply different psycholinguistic techniques to investigate the nature of morphological processing. In this chapter, the most commonly used techniques will be introduced, followed by a detailed account of empirical findings from early and current research practices, ending in an overview of critical issues.

3.2

Main research methods

Research on morphological processing has employed different techniques with words in isolation or context, using reaction times, eye movements, and brain activation patterns to draw conclusions about the representation and processing of morphologically complex words in the mental lexicon. In the following, the most common methods are introduced.

3.2.1

Lexical decision

In a lexical decision task, participants are asked to decide if strings of letters shown on a screen are existing words in the target language or not. Decision latencies (i.e., the time it takes participants to accept a word or reject a nonword) are seen as an indication of processing effort. For instance, nonwords with clearly illicit letter sequences (e.g., ptf-​­ in ptfole) are rejected faster than those similar to existing words (e.g., clain, which is similar to claim) or those that contain existing words (e.g., claim in claimel or claimity). Similarly, faster decision latencies for high-frequency ­ in comparison to low-frequency words show that retrieval from the mental lexicon is facilitated with regular repetition.

3.2.2

Priming

In priming tasks – the most applied method in experimental morphology, the influence of the processing of one word on the processing of another is investigated. More specifically, participants’ reaction times to a target word (e.g., scan) are measured following different types of prime words that were shown beforehand. Typical relationships between primes and targets are morphological ­ (e.g., scanner, scanned), orthographic (e.g., scandal) or semantic (e.g., print). Decreased reaction times to targets following related primes (compared to an unrelated baseline prime) are interpreted as evidence that the activation of the prime word in the mental lexicon makes the target word more easily accessible in one way or another. In the case of morphologically related prime-target pairs, such priming effects are interpreted as a sign of decomposition – especially if the observed facilitation following the morphological prime is the same as for an identity prime (e.g., scan priming scan), which constitutes the largest possible priming effect as the same lexical entry is targeted. Priming studies may differ with respect to the task performed on targets (and primes), the modality in which primes and targets are presented and the prime duration (see Table 3.1 for an overview). Most priming studies use lexical decision tasks on the targets (and, in the case of the long-lag variant, primes); some use naming. Many studies present primes and targets visually (especially with adult participants); some present their items via headphones; in cross-modal priming, primes are presented auditorily before the written targets or vice versa. The prime duration1 can range from being participant-determined in long-lag priming (i.e., how long it takes participants to respond to the prime) to as short as 30 to 60 milliseconds in masked priming, where 39

Vera Heyer ­Table  3.1 Overview of priming techniques Priming variant

Prime/target ­ presentation

Prime duration

Participants’ task

Prime processing

­Long-lag ​­ priming

Prime and target are presented in the same modality (both ­ visually or both auditorily)

Make a lexical decision on (or name) both prime and target

Task performed on primes leads to complete processing

Overt priming

Prime and target are presented in the same modality (both ­ visually or both auditorily) Prime and target are presented visually, with a (forward) ­ mask (e.g., ­ ###########) presented before the prime Prime and target are presented in different modalities (usually ­ auditory primes and visual targets)

As long as participant takes to perform assigned task; usually ­4–7​­ intervening items between prime and target Usually 100 milliseconds or more; target immediately follows prime

Make a lexical decision on (or name) target only

Enough time to consciously perceive primes and fully process these

Usually ­30–60 ​­ milliseconds; target immediately follows prime

Make a lexical decision on (or name) target only

Participants are not consciously aware of seeing primes and cannot fully process these; taps very early processing stages

Length of audio file; target usually follows prime immediately

Make a lexical decision on (or name) target only

Enough time to consciously perceive primes and fully process these; abstracts away from pure visual or phonological overlap

Masked priming

­Cross-modal ​­ priming

participants are not consciously aware of seeing the primes. Manipulations of prime duration are applied to tap into different processing stages, with overtly presented primes being fully processed and masked primes only being processed partially, thus tapping very early processing stages. A comprehensive overview of the masked technique is provided in Forster et al. (2003).

3.2.3 ­Eye-tracking ​­ Although eye-tracking is more widely applied for morphologically simplex words, it does find application in morphological processing research as well. There are two different variants of eyetracking: (1) In eye-tracking during reading, participants’ fixations on individual words are measured while they read sentences or texts, with longer reading times being interpreted as indicators of processing difficulties. (2) In the visual world paradigm, participants listen to linguistic input 40

Experimental morphology

(e.g., the word ­t-shirt) ​­ via headphones while viewing images on the screen. Here, their looks to target (e.g., a t-shirt) or distractor pictures (e.g., a cup of tea) indicate when the target word has been recognised. Introductions to eye-tracking during reading and in the visual world paradigm, respectively, can be found in Rayner (1998) and Tanenhaus et al. (2000).

3.2.4

Neurolinguistic techniques

In neuroimaging or electrophysiological techniques, brain activation is measured to investigate which areas in the brain are activated during morphological processing and when this happens. For example, functional magnetic resonance imaging (fMRI) provides information about which brain areas receive more oxygen (i.e., are active during specific cognitive processes such as the processing of different types of morphologically complex words) and event-related potentials (ERPs) provide information about fluctuations in the brain’s electrical activity following a provided stimulus. These methods are used to investigate if different types of morphologically complex words (e.g., regular versus irregular past tense forms) are processed in different brain regions or memory systems (i.e., procedural versus declarative memory). For a detailed introduction of the techniques and a comprehensive overview of previous research using these to investigate inflection, derivation, and compounding, see Leminen et al. (2019).

3.2.5

Computational modelling

Computational modelling writes algorithms to simulate theories about morphological processing that have been proposed based on behavioural experiments. Researchers can manipulate the influence of different cues in the simulation to fit the theory under investigation and then compare the modelling result to data from experiments with human participants. There is a wide range of simulation models for different aspects of morphological processing (e.g., Baayen et al., 2011).

3.3

Historical perspectives

The basis for morphological processing research is the desire to put linguistic theory to the test. More specifically, psycholinguists interested in morphology ask if morphological structure as formalised by morphologists is psychologically real (i.e., if said structure affects how we produce and comprehend morphologically complex words). Historically, the question about whole-form storage versus decomposition based on morphological structure has been investigated based on two broad phenomena: (1) the contrast between regular (rule-governed) and irregular (idiosyncratic) past tense inflection on the one hand and (2) words that appear to be morphologically complex but do not actually contain any affixes (e.g., the final sequence ‹er› in the noun corner might resemble the affix -er in words like runner or baker but a corner is not ‘a person who corns’). Each of these phenomena will be introduced in this section.

3.3.1

Past tense debate

The English past tense, with its productive and regular suffix -ed as prime candidate for (de)composition and its idiosyncratic irregular forms (e.g., sang, brought, kept) that cannot easily be decomposed, has been widely used as testing ground for the question about storage and (de)composition. For instance, in their seminal long-lag priming study, Stanners et al. (1979) found regular (but not irregular) past tense forms to facilitate stem recognition to the same extent 41

Vera Heyer

as identity primes (full priming). Similarly, Prasada et al.’s (1990) participants produced highfrequency irregular past tense forms faster than low-frequency ones (in analogy to frequency effects observed for simplex words) but their productions of regular forms were not affected by word form frequency. The authors interpreted these findings as support for dual-route models, with regular past tense forms being decomposed into stem and affix, thus pre-activating the stem target, and irregulars being stored as whole forms. By contrast, connectionist frameworks, which assume storage for both regular and irregular forms (single-route models; e.g., Rumelhart & McClelland’s, 1986; or McClelland & Patterson, 2002’s parallel distributed processing [PDP] model), attribute priming differences to different weights for the connections between lexical entries (usually referred to as nodes or units). For instance, while regular and irregular past tense primes and their bases as targets share the same amount of semantic relatedness, regulars normally fully contain the bases (e.g., laugh in laughed) but irregulars do not (e.g., only three non-consecutive letters each in sing – sang, bring – brought, keep – kept).

3.3.2  Affix stripping Arguing for a decomposition approach, Taft and Forster (1975) introduced the so-called prefixstripping procedure, proposing that affixes and stems are separated during word recognition. In a lexical decision task, their participants rejected nonwords that were the (bound) stems of prefixed words (e.g., juvenate as in rejuvenate) more slowly than those that were part of pseudo-prefixed ­ words (e.g., pertoire as in repertoire), indicating that bound stems are accessed during word rec­ ognition and, thus, need to be represented in the mental lexicon. Similarly, priming research showed activation of ‘stems’ in pseudo-affixed words, suggesting that letter sequences that usually constitute an affix are automatically removed in early processing stages. For instance, Rastle et al. (2004) observed what later became known as the corner/corn ­ effect: In masked priming with short prime durations, participants recognised targets (e.g., broth) more quickly following pseudo-derived primes such as brother than purely orthographically related ones such as brothel.2 Crucially, this effect vanished at longer prime durations, indicating that this affix-stripping process is located at early stages of word recognition (e.g., Marslen-Wilson et al., 2008; McCormick et al., 2008; Rastle et al., 2000). ­ The corner/corn effect is problematic for full-listing approaches, which assume that morphological structure does not play a role and, instead, attribute priming effects for morphologically related pairs to a combination of orthographic (i.e., shared letters of the base) and semantic relatedness. If letter sequences such as ‹er› do not have affix status, pseudo-derived primes such as brother should not behave differently from purely orthographically related primes such as brothel because the number of shared letters and the lack of semantic relatedness with the target broth are comparable.

3.4

Critical issues and topics

Morphological processing research has investigated several aspects and contributing factors inspired by morphological theory as well as cross-linguistic comparisons. The factors transparency and productivity that were introduced above were further investigated. In addition to affixation found in inflection and derivation, the third major morphological process (i.e., compounding) can shed further light on how morphological structure is represented and processed. To make universal claims about morphological processing, other languages than English and their characteristics as 42

Experimental morphology

well as other populations such as non-native speakers need to be taken into consideration. Each of these aspects will be introduced in more detail below.

3.4.1

Transparency

One of the most debated issues in morphological processing research is at which point morphosemantic information is accessed. Derived words and their bases are semantically related in that they share meaning (e.g., happiness is the ‘state of being happy’). However, the degree of transparency varies, from transparent items such as happiness through less transparent ones (e.g., business ≠ ‘state of being busy’) to pseudo-derived ones (e.g., harness, where ‹ness› does not represent an affix). Intuitively, decomposing a morphologically complex word into stem and affix is only expedient if the meaning of the whole form can be derived from the meaning of its component parts (as in happiness but not in business or even harness). Indeed, priming studies have shown facilitation effects for transparent but not for opaque derived forms (e.g., Marslen-Wilson et al., 2008; McCormick et al., 2008; Rastle et al., 2000) – at least when prime durations were long. While researchers agree on a distinction between transparent and opaque items in overt priming studies tapping later processing stages, results for masked priming tapping very early processing stages are hotly debated. While Rastle and Davis (2008; Davis & Rastle, 2010) argue based on meta-analyses that affixes are stripped off irrespective of their meaningfulness at short prime durations, Feldman et al. (2009, p. 688) report stronger priming for transparent compared to opaque items at a prime duration of 50 milliseconds, which they classify as ‘nearly prototypical of the published literature’ based on plotting their data against other published studies, thus arguing for morpho-semantic information being used from the beginning. Including effects from more recent studies, Heyer and Kornishova (2018) argue that a more fine-grained distinction with respect to prime duration in masked priming studies needs to be made. While the above meta-analyses covered a large range of prime durations (ranging from 33 to 83 milliseconds), they did not directly investigate potential influences of prime processing time. Based on listing priming differences between transparent and opaque sets by prime duration, Heyer and Kornishova (2018) suggest 50 milliseconds as required time for morphosemantic information to take effect as only one out of 15 experiments with an SOA smaller than 50 milliseconds but more than half of the experiments with longer SOAs showed significant facilitation.

3.4.2

Productivity

Affixes that are productive (i.e., can be applied to many bases) may be stripped off more easily due to them being identified as affixes faster. For instance, many laypeople are not aware that -th (as in warmth) is a nominalising suffix with the same function as -ness. While many studies have compared productive regular inflection versus unproductive irregular inflection (see past tense debate above), most studies on derivation either use a mixture of many different affixes or concentrate on one (productive) affix. The small number of studies that have compared derivational suffixes with different productivities have yielded mixed results. In cross-modal priming, Marslen-Wilson et al. (1996) found primes with productive suffixes or prefixes to facilitate the recognition of another word with the same affix (e.g., darkness priming toughness) more than those with unproductive affixes (e.g., development priming government). In several masked priming studies, in contrast, equivalent priming for productive nominalisations and the corresponding adjectival targets has been reported (e.g., 43

Vera Heyer

­English -ness ​­ ­versus -ity ​­ in Silva & Clahsen, 2008; Japanese -sa ­versus -mi ​­ in Clahsen & Ikemoto, 2012). It is unclear if the observed differences stem from the different priming techniques tapping into late versus early processing stages or from the different types of prime-target pairs (i.e., complex primes for complex versus simplex targets), thus requiring further investigation. Making a broader distinction between regular (rule-based) inflection as productive and derivation in general (with its lexicalised and idiosyncratic forms) as less productive, Bozic and MarslenWilson (2010) report different brain regions being involved in their processing. While activation is mainly localised in left-lateralised frontotemporal regions for rule-based forms such as the English past tense, more unpredictable derived forms lead to bilateral activation.

3.4.3  Affixation versus compounding In contrast to a restricted (and comparatively small) number of affixes available for inflection and derivation, the possible combinations of words into compounds are infinite, thus potentially making the individual constituents more difficult to detect. Nevertheless, studies have shown that participants detect heads and modifiers within compounds – even for opaque compounds such as butterfly. For instance, Libben et al. (2003) showed faster lexical decision times for transparent and opaque compounds alike following the respective constituents (e.g., bed or room priming bedroom; hog or wash priming hogwash), showing that even in opaque compounds the component constituents are activated even though this parse is misleading. Semantic priming between constituent associates and compounds, however, has been found to be restricted to transparent com­ pounds (e.g., coffee primes teaspoon but milk does not prime butterfly, Sandra 1990; Zwitserlood, 1994), suggesting a short-lived effect of an initial decomposition into component parts. Another difference between affixation and compounding concerns the information encoded in the head (i.e., the right-most constituent in English compounds). While, for both processes, the head provides information about the complex word’s lexical category (e.g., blueberry and ​­ happiness are nouns [see berry ­and -ness, respectively] not an adjectives [see blue and happy, respectively]) and other characteristics such as inflectional class (e.g., the German compound Blumentopf ‘flowerpot’ is masculine [Topf ‘pot’] rather than feminine [Blume ‘flower ’]; German derived words ending in -ung ​­ are feminine), the head in compounds also provides information about semantic characteristics (e.g., a blueberry is a type of berry). Thus, heads may have a special status in compounding. Indeed, Libben et al. (2003) observed longer processing times for compounds with opaque heads, in which the meaning of the head was not related to the meaning of the compound (i.e., ­ hogwash is not a type of washing). Furthermore, the relationship between head and modifier in compounds is less straightforward compared to the one between bases and affixes. For instance, snowballs and snowforts are made of snow, while a snowshovel is used to remove snow from the path. This abstract relational structure has been shown to influence processing: For instance, Gagné and Spalding (2009) found slower reaction times to compound words when the prime contained a different relation between modifier and head (e.g., snowball was recognised more slowly following snowshovel with its different ­FOR-relationship ​­ than following snowfort with its same MADE OF-relationship).

3.4.4 ­Cross-linguistic ​­ differences The majority of morphological processing research (and thus the resulting theories of the mental lexicon) has focussed on English. However, it is important to expand to other languages to make universal claims, considering differences in morphological complexity (e.g., the amount 44

Experimental morphology

of affixation or compounding present in a language) and morphological systems (e.g., affixation versus templatic morphology). With respect to morphological complexity, it has been argued that languages with rich affixation will be more likely to exhibit effects of decomposition. For instance, Smolka and colleagues (2014; 2019) argue that German shows more morphological priming than English due to its larger morphological inventory. More specifically, while previous studies on English did not find overt priming effects for opaque items (see 3.4.1 Transparency section above), Smolka et al. (2014) reported comparable priming for transparent primes like zubinden (‘to tie’) and opaque primes such as entbinden (‘to deliver ’ [childbirth]), thus arguing for different lexical representations in German compared to English, with morphological structure being represented. In order to investigate the English-German contrast further, Günther et al. (2019) compared Rastle et al.’s (2000) English and Smolka et al.’s (2014) German materials in a vector analysis, concluding that the German morphological system is characterised by higher morphological systematicity3 than the English one (i.e., ‘affixes – on average – modify stem meanings in a relatively consistent and predictable manner’ (Günther et al., 2019, p. 174) in German), thus leading German readers to pay special attention to stems irrespective of transparency, resulting in priming across transparent and opaque items. In Semitic languages such as Arabic or Hebrew with their non-concatenative templatic morphology, affix stripping is impossible. Morphologically complex words in these languages consist of consonantal roots (e.g., Arabic ktb relating to ‘write’) and patterns (primarily consisting of vowels) inserted between the root consonants (e.g., kattab ‘caused to write’ versus kaatab ‘corresponded’) instead of concatenation of affixes to an edge of the root, resulting in a more complex structure for decomposition. Nevertheless, priming studies with Arabic and Hebrew (e.g., Boudelaa & Marslen-Wilson, 2005; Frost et al., 2000) have shown priming effects for morphologically related prime-target pairs. Interestingly, Boudelaa and Marslen-Wilson (2011) found priming effects to be modulated by root productivity, with priming occurring only if the root appears in many different complex words (i.e., has a large family size) whereas the productivity of word patterns did not have an influence.

3.4.5

Native versus non-native morphological processing

In the last few decades, research has branched out to another population: non-native speakers. According to Ullman’s (2001a; 2001b; 2005) declarative/procedural (DP) model, morphological processing differs in this group of speakers. Based on reduced or absent activation of brain areas connected to procedural memory, in which automatic rule application is located in native speakers, Ullman claims that non-native speakers either learn productive grammatical rules (such as the English past tense suffixation with -ed) explicitly in declarative memory or even memorise complex forms such as laughed, storing them as part of the lexicon in declarative memory. Results from masked priming studies investigating if non-native speakers show morphological priming are inconclusive. While some researchers report differences between native and nonnative groups, with absent or reduced priming in non-native speakers (e.g., Silva & Clahsen, 2008; Veríssimo et al., 2018), others find comparable priming effects in both groups (e.g., Diependaele et al., 2011; Feldman et al., 2010). Potential factors contributing to between-study differences include type of morphological process (e.g., Jacob et al. (2018) comparing inflection and derivation; Dal Maso & Giraudo (2014) comparing productive and unproductive suffixes), participants’ age of acquisition (e.g., Veríssimo et al., 2018) or proficiency (e.g., Foote, 2017). Calling into question the morphological nature of the observed priming effects for derived words in non-native processing of morphologically complex words, several studies also reported 45

Vera Heyer

significant priming effects for purely orthographically related items such as scandal – scan (e.g., ­ Heyer & Clahsen 2015; Li et al., 2017a). Based on comparable priming effects for morphological and orthographic items, Heyer and Clahsen (2015) propose a higher reliance on orthographic than morphological information in non-native processing. This reliance appears to decrease with increasing proficiency (Viviani & Crepaldi, 2022), though.

3.5

Current contributions and research

Recent proposals on morphological processing have turned away from affixes as driving factor to varying degrees. Instead of affixes being automatically stripped off (as suggested in Taft and Forster’s original prefix-stripping proposal), Grainger and Beyersmann (2017) propose an edgealigned embedded word activation process that operates from the beginning and end of an orthographic word, using the surrounding spaces as indication of where a word begins and ends. However, (pseudo) affixes still play a vital role in that affixes are represented on the orthographic and morpho-semantic level in the model (see Grainger & Beyersmann’s Figure 1), boosting activation of the (pseudo)stem, which in turn reduces the effect of inhibition between unrelated forms on the orthographic level. For instance, while the left-aligned broth is detected in both brother and brothel and inhibition between broth and the two longer words would normally prevent priming, the presence of the pseudo-affix -er in brother counteracts the inhibition effect, resulting in priming following brother but not following brothel. Grainger and Beyersmann (2017, p. 305) argue that edge-aligned stem detection is a ‘simple bootstrapping mechanism’ for beginning readers (that also applies to compound processing; e.g., Beyersmann et al., 2018) and representations for affixes are only established at a later stage in reading development. Relinquishing both stems and affixes altogether, Baayen and colleagues’ (Baayen et al., 2011; 2015; Milin et al., 2017) Naive discriminative learning (NDL) framework maps letter (or phone) bi/trigrams as cues to semantic representations (so-called lexomes) as outcomes. These computational models are trained on corpus data, which results in different weights between cues and lexomes based on co-occurrence statistics. Instead of morphological relatedness per se, factors such as neighbourhood density (i.e., the number of words that differ only in one letter/phone with a word in question) are used to explain observed effects in priming studies. The visual world eye-tracking paradigm enables researchers to investigate how participants use morphological information to predict upcoming elements in a sentence. For instance, in Spanish and German, articles and, in the case of German, attributive adjectives preceding nouns encode gender information, which can be utilised to predict the respective upcoming noun in a noun phrase when presented with a choice of referents. Indeed, Lew-Williams and Fernald’s (2010) Spanish participants looked to a target picture (e.g., a cow) faster at hearing the gender­ marked article when the two shown pictures differed with respect to gender (e.g., laFEM vaca ‘cow’ ­ and elMASC pájaro ‘bird’) than when they matched (e.g., laFEM vaca ‘cow’ and laFEM rana ‘frog’). Lemmerth and Hopp (2019) found the same predictive-looking behaviour in German following gender-marked articles or attributive adjectives. This type of research illustrates a useful on-line utilisation of morphological information that supports language comprehension in everyday life. With electrophysiological and neuroimaging techniques being more widely applied in morphological processing research in recent years, steps towards the timing and localisation of the processes involved in comprehension and production have been taken but are far from resolved. In their review of studies applying electroencephalography (EEG), magnetoencephalography (MEG) and functional magnetic resonance imaging (fMRI), Leminen et al. (2019) systematically compare results from different neurolinguistic techniques for inflection, derivation and compounding 46

Experimental morphology

separately. While findings for inflection were relatively consistent with dual-route approaches, studies looking at derivation and compounding led to more diverse outcomes, illustrating that it is important to distinguish both between the linguistic processes involved and the methods applied.

3.6

Recommendations for practice

As priming effects are only a few milliseconds long, researchers should carefully match their materials (e.g., with respect to length and frequency). Especially when comparing different related primes (e.g., morphological versus orthographic), a design with the same target being primed by the respective primes (e.g., scanner and scandal priming scan) is preferable to different item sets ­ ​­ (e.g., scanner – scan versus ­brothel  – broth) to avoid differences between sets due to the different ­ targets (e.g., scan versus broth). Furthermore, one should not rely on a single experiment (and one technique and language) to draw conclusions about the nature of morphological processing. Another practical consideration is the presentation of targets in masked priming. Usually, targets are presented in capital letters to reduce pure visual overlap between primes and targets (e.g., scanner – SCAN). When working with less experienced readers (e.g., children or second language learners with different native scripts), reading might be slowed down due to the unusual type face. Additionally, when dealing with languages other than English, using capital letters for targets might mask useful spelling cues such as initial capital letters for nouns (as in German, for example). A technical aspect for visual studies with controlled prime duration is the computer’s refresh rate. Especially in masked priming, it is crucial for prime durations being exactly the same length across items and participants. Therefore, prime durations should be a multiple of the screen’s refresh rate. For instance, when aiming for a prime duration of approximately 30 milliseconds, researchers should set their experiment to 32 milliseconds when their screen’s refresh rate is 16 milliseconds.

3.7

Future directions

Much of previous research compared distinct groups of items for their factor of interest (e.g., high- versus low-frequency, transparent versus opaque), but, thanks to more sophisticated statistical analyses, researchers can now conduct more fine-grained comparisons using scales. In the past, researchers had to decide where to make a cut for distinct groups and potentially remove a middle section for continuous factors such as frequency or transparency; in regression modelling and GAMMs today, this is not necessary any longer and researchers can look for continuous effects in their data (e.g., Heyer & Kornishova, 2018). Similarly, individual differences with respect to general cognitive abilities (e.g., working memory), age, performance in a related task or language proficiency can now be added in statistical analyses. This is an area that is increasingly investigated (e.g., age of acquisition in Veríssimo et al., 2018; reading exposure and personality measures in Lõo et al., 2019). While most morphological processing research has concentrated on English words in isolation, future research should expand (more) to other languages and look at wider contexts. With the priming paradigm presenting words in isolation, the processing of the presented words may not reflect natural language processing, where words are usually encountered in sentence contexts. Therefore, Paterson et al. (2011) presented prime-target pairs embedded in sentences, which may prove difficult, though, when planning to include several different prime types that all need to fit into the same sentence context. So, future morphological processing research should explore different paradigms that may allow for more natural contexts. 47

Vera Heyer

Notes 1 In (masked) priming research, prime duration is often referred to as stimulus onset asynchrony (SOA), referring to the time from prime onset to target onset. Prime duration and SOA differ in cases where an additional blank screen or mask is shown between masked prime and target to allow for longer processing time but still preventing participants from consciously perceiving primes (e.g., Li et al., 2017b). 2 Note that contrary to the presentation here, Rastle et al. did not use the same target with different primes but used different item sets for pseudo-derived (or, in their terminology, opaque) and orthographic (form) items. Brothel was one of their form items and the contrast used here also features in the title of their paper. 3 Note, though, that the items also differed with respect to the type of affixation, with prefixation in Smolka et al. (2014) and suffixation in Rastle et al. (2000).

Further reading Berthiaume, R. (2018). Morphological Processing and Literacy Development. Taylor & Francis. Kahraman, H., & Beyersmann, E. (in press). Cross-language influences on morphological processing in bilinguals. In I. Elgort, A. Siyanova-Chanturia & M. Brysbaert (Eds.), Cross-Language Influences in Bilingual Processing and Second Language Acquisition (pp. John Benjamins. ­­  ­232–266). ​­ Milin, P., Smolka, E., & Feldman, L. B. (2018). Models of lexical access and morphological processing. In E. M. Fernández & H. Smith Cairns (Eds.), The Handbook of Psycholinguistics (pp. Wiley  & Sons. ­­  ­240–268). ​­

Related topics Analysing reading with eye-tracking; analysing spoken language comprehension with eye-tracking; analysing language comprehension using ERP; analysing language using brain imaging; new directions in statistical analysis for experimental linguistics

References Baayen, R. H., Milin, P., Đurđević, D. F., Hendrix, P., & Marelli, M. (2011). An amorphous model for morphological processing in visual comprehension based on naive discriminative learning. Psychological ­ 438–481. ­ ​­ Review, 118(3), Baayen, R. H., Shaoul, C., Willits, J., & Ramscar, M. (2015). Comprehension without segmentation: A proof ­ 106–128. ­ ​­ of concept with naive discrimination learning. Language, Cognition and Neuroscience, 31(1), Beyersmann, E., Kezilas, Y., Coltheart, M., Castles, A., Ziegler, J. C., Taft, M., & Grainger, J. (2018). Taking the book from the bookshelf: Masked constituent priming effects from compound words and nonwords. ­ -​­ ­1–13. Journal of Cognition, 1(1), ​­ Boudelaa, S., & Marslen-Wilson, W. D. (2005). Discontinuous morphology in time: Incremental masked priming in Arabic. Language and Cognitive Processes, 20, 207–260. ­ ​­ Boudelaa, S., & Marslen-Wilson, W. D. (2011). Productivity and priming: Morphemic decomposition in Arabic. Language and Cognitive Processes, 26, 624–652. ­ ​­ Bozic, M., & Marslen-Wilson, W. D. (2010). Neurocognitive contexts for morphological complexity: Dissociating inflection and derivation. Language and Linguistics Compass, 4(11), ­ ­1063–1073. ​­ Butterworth, B. (1983). Lexical representation. Language Production, 2, 257–294. ­ ​­ Bybee, J. L. (1988). Morphology as lexical organization. In M. T. Hammond & M. P. Noonan (Eds.), Theo­­  ­119–141). ​­ retical Morphology: Approaches in Modern Linguistics (pp. Academic Press. Bybee, J. L. (1995). Diachronic and typological properties of morphology and their implications for representation. In L. B. Feldman (Ed.), Morphological Aspects of Language Processing (pp. ­­  ­225–246). ​­ Lawrence Erlaum Associates. Clahsen, H. (2006). Dual-mechanism morphology. In K. Brown (Ed.), Encyclopedia of Language and Linguistics (pp. ­­  ­1–5). ​­ Elsevier. Clahsen, H., & Ikemoto, Y. (2012). The mental representation of derived words: An experimental study of -sa ­and -mi ​­ nominals in Japanese. The Mental Lexicon, 7, ­147–182. ​­ Dal Maso, S., & Giraudo, H. (2014). Morphological processing in L2 Italian: Evidence from a masked priming study. Lingvisticæ Investigationes, 37(2), ­ 322–337. ­ ​­

48

Experimental morphology Davis, M. H., & Rastle, K. (2010). Form and meaning in early morphological processing: Comment on Feldman, O’Connor, and Moscoso del Prado Martín (2009). Psychonomic Bulletin & Review, 17, 749–755. ­ ​­ Diependaele, K., Duñabeitia, J. A., Morris, J., & Keuleers, E. (2011). Fast morphological effects in first and second language word recognition. Journal of Memory and Language, 64, ­344–358. ​­ Feldman, L. B., Kostić, A., Basnight-Brown, D. M., Filipović Đurđević, Đ., & Pastizzo, M. J. (2010). Morphological facilitation for regular and irregular verb formation in native and non-native speakers: Little evidence for two distinct mechanisms. Bilingualism: Language and Cognition, 13, ­119–135. ​­ Feldman, L. B., O’Connor, P. A., & Moscoso del Prado Martín, F. (2009). Early morphological processing ​­ ­​­­ ​­ is morphosemantic and not simply ­morpho-orthographic: A violation of ­form-then-meaning accounts of word recognition. Psychonomic Bulletin & Review, 16, ­684–691. ​­ Foote, R. (2017). The storage and processing of morphologically complex words in L2 Spanish. Studies in Second Language Acquisition, 39(4), ­ ­735–767. ​­ Forster, K. I., Mohan, K., & Hector, J. (2003). The mechanics of masked priming. In S. Kinoshita & S. J. ­ Lupker (Eds.), Masked Priming. The State of the Art (pp. Psychology Press. ­­  ­3–37). ​­ Frost, R., Deutsch, A., Gilboa, O., Tannenbaum, M., & Marslen-Wilson, W. D. (2000). Morphological priming: Dissociation of phonological, semantic, and morphological factors. Memory & Cognition, 28, ­1277–1288. ​­ Gagné, C. L., & Spalding, T. L. (2009). Constituent integration during the processing of compound words: Does it involve the use of relational structures? Journal of Memory and Language, 60, ­20–35. ​­ Grainger, J., & Beyersmann, E. (2017). Edge-aligned embedded word activation initiates morpho-orthographic ­­  ­285–317). ​­ segmentation. In B. H. Ross (Ed.), Psychology of Learning and Motivation (pp. Academic Press. Günther, F., Smolka, E., & Marelli, M. (2019). ‘Understanding’ differs between English and German: Capturing systematic language differences of complex words. Cortex, 116, 168–175. ­ ​­ Heyer, V., & Clahsen, H. (2015). Late bilinguals see a scan in scanner AND in scandal: Dissecting formal overlap from morphological priming in the processing of derived nouns. Bilingualism: Language and ­ ­543–550. ​­ Cognition, 18(3), Heyer, V., & Kornishova, D. (2018). Semantic transparency affects morphological priming…eventually. Quarterly Journal of Experimental Psychology, 71, 1112–1124. ­ ​­ Jacob, G., Heyer, V., & Veríssimo, J. (2018). Aiming at the same target: A masked priming study directly ­ comparing derivation and inflection in the second language. International Journal of Bilingualism, 22(6), ­619–637. ​­ Leminen, A., Smolka, E., Duñabeitia, J. A., & Pliatsikas, C. (2019). Morphological processing in the brain: ­ ​­ The good (inflection), the bad (derivation) and the ugly (compounding). Cortex, 116, 4–44. Lemmerth, N., & Hopp, H. (2019). Gender processing in simultaneous and successive bilingual children: Cross-linguistic lexical and syntactic influences. Language Acquisition, 26(1), ­ 21–45. ­ ​­ Lew-Williams, C., & Fernald, A. (2010). Real-time processing of gender-marked articles by native and non­ ­447–464. ​­ native Spanish speakers. Journal of Memory and Language, 63(4), Li, J., Taft, M., & Xu, J. (2017a). The processing of English derived words by Chinese-English bilinguals. ­ ­858–884. ​­ Language Learning, 67(4), Li, M., Jiang, N., & Gor, K. (2017b). L1 and L2 processing of compound words: Evidence from masked priming experiments in English. Bilingualism: Language and Cognition, 20(2), ­ ­384–402. ​­ Libben, G., Gibson, M., Yoon, Y. B., & Sandra, D. (2003). Compound fracture: The role of semantic transpar­ ​­ ency and morphological headedness. Brain and Language, 84, 26–43. Lõo, K., Toth, A., Karaca, F., & Järvikivi, J. (2019). Effects of affective ratings and individual differences in English morphological processing. In K. Ashok, C. Seifert, & C. Freska (Eds.), Proceedings of the 41th Annual Meeting of the Cognitive Science Society (pp. Cognitive Sciences Society. ­­  ­2179–2185). ​­ Marslen-Wilson, W. D., Bozic, M., & Randall, B. (2008). Early decomposition in visual word recognition: Dissociating morphology, form, and meaning. Language and Cognitive Processes, 23, ­394–421. ​­ Marslen-Wilson, W. D., Ford, M., Older, L., & Zhou, X. (1996). The combinatorial lexicon: Priming derivational affixes. In G. W. Cottrell (Ed.), Proceedings of the 18th Annual Conference of the Cognitive Science Society (pp. ­­  ­223–227). ​­ Lawrence Erlbaum Associates Inc. McClelland, J. L., & Patterson, K. (2002). Rules or connections in past-tense inflections: What does the evidence rule out? Trends in Cognitive Sciences, 6, 465–472. ­ ​­ McCormick, S. F., Rastle, K., & Davis, M. H. (2008). Is there a “fete” in “fetish”? Effects of orthographic opacity on morpho-orthographic segmentation in visual word recognition. Journal of Memory and Language, 58, 307–326. ­ ​­

49

Vera Heyer Milin, P., Feldman, L. B., Ramscar, M., Hendrix, P., & Baayen, R. H. (2017) Discrimination in lexical decision. PLoS ONE, 12(2), ­ e0171935. Paterson, K., Alcock, A., & Liversedge, S. P. (2011). Morphological priming during reading: Evidence from eye movements. Language and Cognitive Processes, 26, 600–623. ­ ​­ Pinker, S. (1999). Words and Rules. The Ingredients of Language. Phoenix. Plaut, D. C., & Gonnerman, L. M. (2000). Are non-semantic morphological effects incompatible with a distributed connectionist approach to lexical processing? Language and Cognitive Processes, 15, ­445–485. ​­ Prasada, S., Pinker, S., & Snyder, W. (1990). Some evidence that irregular forms are retrieved from memory but regular forms are rule-generated. 31st Annual Meeting of the Psychonomic Society. New Orleans, LA. Rastle, K., & Davis, M. H. (2008). Morphological decomposition based on the analysis of orthography. Language and Cognitive Processes, 23, 942–971. ­ ​­ Rastle, K., Davis, M. H., Marslen-Wilson, W. D., & Tyler, L. K. (2000). Morphological and semantic effects in visual word recognition: A time-course study. Language and Cognitive Processes, 15, ­507–537. ​­ Rastle, K., Davis, M. H., & New, B. (2004). The broth in my brother’s brothel: Morphoorthographic segmentation in visual word recognition. Psychonomic Bulletin & Review, 11, ­1090–1098. ​­ Rayner, K. (1998). Eye movements in reading and information processing: 20 years of research. Psychological Bulletin, 124(3), ­ ­372–422. ​­ Rumelhart, D. E., & McClelland, J. L. (1986). On learning the past tense of English verbs. In J. A. Feldman, P. J. Hayes & D. E. Rumelhart (Eds.), Computational Models of Cognition and Perception (pp. ­­  ­216–271). ​­ MIT Press. Sandra, D. (1990). On the representation and processing of compound words: Automatic access to constituent ­ 529–567. ­ ​­ morphemes does not occur. Quarterly Journal of Experimental Psychology, 42(3), Schreuder, R., & Baayen, H. (1995). Modelling morphological processing. In L. B. Feldman (Ed.), Morphological Aspects of Language Processing (pp. Erlbaum. ­­  ­131–154). ​­ Silva, R., & Clahsen, H. (2008). Morphologically complex words in L1 and L2 processing: Evidence from masked priming experiments in English. Bilingualism: Language and Cognition, 11, ­245–260. ​­ Smolka, E., Libben, G., & Dressler, W. U. (2019). When morphological structure overrides meaning: Evidence from German prefix and particle verbs. Language, Cognition and Neuroscience, 34(5), ­ ­599–614. ​­ Smolka, E., Preller, K. H., & Eulitz, C. (2014). ‘Verstehen’ (‘understand’) primes ‘stehen’ (‘stand’): Morphological structure overrides semantic compositionality in the lexical representation of German complex verbs. Journal of Memory and Language, 72, 16–36. ­ ​­ Stanners, R. F., Neiser, J. J., Hernon, W. P., & Hall, R. (1979). Memory representation for morphologically related words. Journal of Verbal Learning and Verbal Behavior, 18, ­399–412. ​­ Taft, M. (2004). Morphological decomposition and the reverse frequency effect. Quarterly Journal of Experimenta Psychology Section A: Human Experimental Psychology, 57, 745–765. ­ ​­ Taft, M., & Forster, K. I. (1975). Lexical storage and retrieval of prefixed words. Journal of Verbal Learning and Verbal Behavior, 14, ­638–647. ​­ Taft, M., & Nguyen-Hoan, M. (2010). A sticky stick? The locus of morphological representation in the lexicon. Language and Cognitive Processes, 25, ­277–296. ​­ Tanenhaus, M. K., Magnuson, J. S., Dahan, D., & Chambers, C. (2000). Eye movements and lexical access in spoken-language comprehension: Evaluating a linking hypothesis between fixations and linguistic processing. Journal of Psycholinguistic Research, 29(6), ­ 557–580. ­ ​­ Ullman, M. T. (2001a). The neural basis of lexicon and grammar in first and second language: The declarative/ procedural model. Bilingualism: Language and Cognition, 4, ­105–122. ​­ Ullman, M. T. (2001b). A neurocognitive perspective on language: The Declarative/Procedural Model. Nature Reviews Neuroscience, 2, ­717–726. ​­ Ullman, M. T. (2005). A cognitive neuroscience perspective on second language acquisition: The declarative/ procedural model. In C. Sanz (Ed.), Mind and Context in Adult Second Language Acquisition: Methods, Theory, and Practice (pp. ­­  ­141–178). ​­ Georgetown University Press. Veríssimo, J., Heyer, V., Jacob, G., & Clahsen, H. (2018). Selective effects of age of acquisition on morphological priming: Evidence for a sensitive period. Language Acquisition, 25, ­315–326. ​­ Viviani, E., & Crepaldi, D. (2022). Masked morphological priming and sensitivity to the statistical structure of ­form-to-meaning ­​­­ ​­ mapping in L2. Journal of Cognition, 5(1), ­ 30. Zwitserlood, P. (1994). The role of semantic transparency in the processing and representation of Dutch compounds. Language and Cognitive Processes, 9(3), ­ ­341–368. ​­

50

4 EXPERIMENTAL SYNTAX Yamila Sevilla and María Elina Sánchez

4.1

Introduction

Experimental approaches have allowed researchers to investigate a large range of questions about linguistic representations, and about how those representations are put to use during language processing. The goal of this chapter is to show how experimental data can be and have been employed to enrich our understanding of how syntax works. Traditionally, such evidence has been used to explain grammatical generalizations, to test theoretical hypotheses or to assess competing approaches originating from formal syntax, but the very dynamics of psycho- and neurolinguistics research have also raised new and theoretically relevant questions. The experimental data used to assess syntax experimentally come from a variety of sources. Among them, offline tasks, and especially acceptability judgments, constitute the main source for building syntactic theory. Another set of evidence comes from different online techniques, which tap into ongoing processes during sentence comprehension and production. Studies using online measures have made it possible to explore both the constraints of grammar (those traditionally revealed by offline techniques) and those imposed by general processing mechanisms, such as working memory and control mechanisms. Online data are the most widely generated by psychoand neurolinguistics and hence the kind of evidence we will focus on in this chapter. One of the central assumptions of psycholinguistics is that the processing system works optimally, guided by universal constraints that operate locally. Violations to that general principle of economy give rise to processing effects that can be empirically measured. As emerges from the historical overview addressed in Section 4.2, to understand representations and syntactic mechanisms, psycho- and neurolinguistic research has fruitfully exploited those situations in which the processing system fails, is less efficient, or where there is a mismatch between what our grammatical intuitions tell us (expressed through time-unlimited judgments) and what the online use reveals (i.e., reaction times, electrophysiological and hemodynamic responses, eye-tracking in different natural or experimental linguistic task). Cases of these challenging situations are: (1) effects of reanalysis or repairing derived from temporary ambiguity, as in garden-path sentences (i.e., The ­ memory-overload-dependent ­ ­​­­ ​­ ​­ horse raced past the barn fell), (2) effects, as in ­center-embedding ­ clauses (i.e., The reporter who the senator who John met attacked disliked the editor), (3) attraction errors and other illusions of grammaticality, as in subject-verb agreement errors (i.e., *The key 51

DOI: 10.4324/9781003392972-6

Yamila Sevilla and María Elina Sánchez

of the cabinets ARE on the table). In Section 4.3, we examine examples of how experimental data from such situations have allowed us to formulate (and sometimes answer) some of the central questions about syntax during the online production and comprehension of sentences. Attachment preferences and locality effects (violation to islands constraints, agreement attraction and intervention in long-distance dependencies) serve to show how experimental data have been accounted for. In Section 4.4, we review (although, given its multiplicity and intricacy, not exhaustively) different types of data that have been fundamental in the development of studies on grammar: acceptability judgments, eye-tracking and pupillometry, electrophysiological and hemodynamic techniques and neuropsychological data.

4.2

Historical perspectives

It is usually understood that the study of syntax by experimental means had its beginning in the mid-20th century, pushed by the impulse that generative grammar gave to the investigation of language as a universal cognitive and biological human faculty. From its beginning, the cognitive turn in language studies led to changes in the way the relationship between language and its underly­ ing psychological processes were investigated. Since Chomsky’s (1965) competence/performance distinction, providing an explicit description of the set of rules and abstract principles that underlie language use, became the main objective of linguistic studies, which adopted the grammatical judgments of native speakers as its dominant source of information (Myers, 2017). Taking performance as its target, psycholinguistics then focused on understanding how representations are constructed, retrieved, and put to use during the comprehension and production of language in real time, also taking into account the restrictions imposed by the cognitive system (attention, executive control, memory). Thus, a new form of psycholinguistics was generated: one that sought to interpret grammar as a model for the psychology of language (Miller & Chomsky, 1963; Levelt, 2008). Transformational generative theory was then ‘thought to provide the basis for a description of how speakers and listeners actually produce and understand sentences—it had potential psychological reality’ (Berko Gleason & Berstein Ratner, 1993). The classic view of a direct relationship (transparency) between the theories of grammar and the processing system is known as Derivational Theory of Complexity (DTC), which held that the more operations there are in a sentence derivation, the more difficult it would be to process it (Fodor et al., 1974). In other words, the total cost of building a structure in real time, like in online sentence parsing, is equal to the total number of computational steps in the derivation (at the time, the number of transformations). Under this assumption, for example, processing a passive sentence would cost more time than processing an active sentence because an extra operation, the passive transformation, would be involved in the derivation. The DTC hypothesis was investigated experimentally. While early work apparently confirmed it (McMahon, 1965; Savin & Perchonock, 1965), later research strongly disconfirmed it (Slobin, 1966; Watt, 1970). Furthermore, according to this view, syntactic processing is automatic (mandatory) and encapsulated (impervious to semantic or pragmatic information). These properties, together with the universality of the structural strategies, became hallmarks of the first models of language comprehension, which sought to explain processing phenomena such as legal but unaffordable center­embedding (The rat [that the cat [that the dog chased] killed] ate the malt) in which computation cannot be completed due to memory overload, or garden path effects (The horse raced past the barn fell), where the parsing is challenged by later input so that an earlier interpretation must be repaired, increasing difficulty. Together, these studies inaugurated long-standing discussions between autonomous serial (syntactocentric) models (i.e., Carreiras & Meseguer, 1999; Frazier & 52

Experimental syntax

Fodor, 1978; Frazier & Rayner, 1982) and interactive parallel models of sentence processing that take into account different kinds of information (i.e., MacDonald et al., 1994; Trueswell & Tanenhaus, 1994; Tyler & Marslen-Wilson, 1977). Eventually, the DTC was discredited insofar as it failed to find evidence of direct correspondence between theories of grammar and processing mechanisms. This frustration is taken as one of the motivations that caused the split between formal linguistics and psycholinguistics which have worked separately during the last decades (Sanz et al., 2013). Once DTC was abandoned, the experimental research agenda (psycholinguistics and neurolinguistics) expanded following its own interests and concerns in a more diverse and productive way, but lacking a unified theoretical program that could put all the pieces together and make sense of the puzzle. Since then, a wealth of data across languages, populations and methodologies has outlined a landscape that is hard to subsume under a single framework or even summarize consistently. Nevertheless, much of the fundamental intuition of the DTC is alive and well in a good deal of current experimental work, and a number of attempts are being made (on quite different conceptual grounds) to explicitly align theories of grammar with language processing theories (Bornkessel & Schlesewsky, 2006; Bresnan, 2001; Culicover & Jackendoff, 2005; Hagoort, 2005; Lewis & Phillips, 2015; Lewis & Vasishth, 2005; Sprouse & Lau, 2013).

4.3

Critical issues and current contributions 4.3.1

Universality and economy

A common assumption within psycho- and neurolinguistics is that grammar operates according to universal principles. Experimental data have allowed discussions of whether and to what extent the processing system is guided by this type of principles across languages. It is also commonly assumed that humans process language in an incrementally optimal way, both in parsing during online comprehension, and in planning during production. The main idea is that, with a few exceptions, the system is governed by principles of economy that allow it to reduce processing costs, for example, by choosing the simplest parsing, the most available structure, or the shortest dependency. The use of economy strategies has had great empirical support in the domain of parsing preferences and dependency formation. The literature on sentence comprehension extensively explored the question of universality by studying Relative Clause (RC) attachment ambiguities. Following the hypothesis of universally applicable principles, Frazier (1978) postulated the strategies of “minimal attachment” and “late closure”, according to which the system prefers the attachment of a new constituent to the most recently processed phrase. Thus, in the ambiguous sentence (1): 1 Someone shot the servant of the actress [who was on the balcony]. The RC would be preferentially attached to the local Determiner Phrase (DP) [the actress] over to the higher DP [the servant …]. However, this idea of universal attachment strategies was soon questioned by cross-linguistic studies. ­ ​­ Cuetos and Mitchell (1988) and Carreiras and Clifton (1993) found that speakers of Spanish were more prone to non-local attachment (early closure). In this former study, authors used sentences such as: Alguien disparó contra el criado de la actriz que estaba en el balcón/ someone shot the servant of the actress who was on the balcony, and measured both the interpretation (who was ­ on the balcony?) and the reading times. Instead of attaching the RC (who was on the balcony) to 53

Yamila Sevilla and María Elina Sánchez

the DP2 (la ­ actriz) of the complex DP (DP1 of DP2), as predicted by the late closure strategy, par­ criado) as the host of the RC. This evidence, which seemed ticipants preferred to take the DP1 (el to challenge the universality of parsing principles of locality, motivated a huge amount of experimental research in diverse languages. Indeed, whereas speakers of many languages displayed a preference for attaching the RC to the most local DP (e.g., English, Basque, Japanese, Mandarin), speakers of other languages showed an “Early Closure” preference (Spanish, Dutch, Greek), and others still showed mixed results or no preferences at all (French, Italian). The necessity of accommodating these heterogeneous data led several authors to defend the idea that language-specific strategies guided the parser and to propose alternative models. Other authors, however, argued that there was no necessity to invoke language-specific strategies and pointed out that the studies done in different languages have not always been legitimately comparable (Baccino, et al., 2000). Therefore, despite superficial similarities, different studies may have been testing slightly different grammatical structures. In addition, the existence of syntactic alternatives in a given language may render the comparison across languages difficult, since it may be biasing interpretation preferences. Taking the case of Spanish in a closer scrutiny, for example, Aguilar and Grillo (2021) showed that the high attachment preference does not constitute an exception to locality principles but results from an independent preference for a structurally simpler parsing. In summary, the attachment preferences debate presents an interesting case study illustrating how cross-linguistic data can contribute to check the boundaries of universality and its counterpart, principled variation; while also raising a caveat about the comparability of results.

4.3.2

Locality

The discussed ambiguity examples suggest that when grammar allows more than one option, the system prefers to build structural relations with the closest possible element that is relevant to that relation. More generally, the case of attachment preferences seems to illustrate the fact that structural relations operate under locality constraints (Chomsky, 1965; Gibson, 2000; Just & Carpenter, 1992), a property that might rest on more general cognitive mechanisms, like recency. It is commonly accepted that, even if grammar can generate potentially unlimited structures and create relations among quite distant elements, syntactic rules are typically applied on a local basis. Since locality extends over a wide variety of syntactic relationships, such as agreement or movement, contemporary theoretical linguistics has been committed to identifying local configurations that are relevant to syntax. Disruption in locality generates processing effects, which are empirically testable through different measures such as error rates, latencies, and other measures of cognitive effort. As distance increases, a sentence becomes more awkward, harder to understand (especially during online processing) or remember, and errors become more likely to occur. These locality effects have been thoroughly examined in many languages and a full range of dependency types. Several hypotheses have been proposed to explain these difficulties. According to one perspective, the Dependency Locality Theory (Gibson, 2000), the resolution of a dependency critically rests on the maintenance in memory of linguistic material that must be integrated at a later point in the sentence. Since the activation of items in memory decays over time, the greater the distance between the elements of the dependency, the greater the difficulty in retrieving the appropriate item when necessary. A related hypothesis, the theory based on retrieval cues (Lewis & Vasishth, 2005), attributes the locality effect not only to decay in time but also to the interference of similar items during encoding or retrieval of target elements from memory during online processing. Evidence 54

Experimental syntax

of this similarity-based interference can be found in the fact that the more similar the properties of the interfering elements are, the greater the processing difficulty. For instance, it’s been proven that items sharing grammatical properties (pronoun versus nouns, for example), inflectional features or semantic properties trigger longer reading latencies at the integration point (i.e., Gordon et al., 2001; 2004). Taking the case of Spanish in a closer scrutiny, for example, Aguilar and Grillo (2021) showed that the high attachment preference does not constitute an exception to locality principles, but results from an independent preference for a structurally simpler parsing. The variables associated with cognitive resources limitations may offer a plausible explanation of the locality effects, given that the working memory capacity of individuals seems to affect performance on sentence processing consistently (Caplan & Waters, 2013). However, there is also evidence of some antilocality effects, that is, the fact that distance facilitates subsequent processing (Vasishth & Lewis, 2006). For example, in a self-paced reading experiment in German, Konieczny (2000) has found that verb’s reading times speed up as distance between the verb and its argument increases. These antilocality effects find a natural explanation in expectations-based accounts (Levy, 2008), which assume that users maintain and retrieve linguistic information in a probabilistic way (based on previous experience) to parse or plan sentences incrementally. Hence, they predict processing difficulties when the system must build a syntactic representation that is unlikely or rare. For example, according to such kind of perspective, the processing asymmetry between subject relative clauses (SR: The reporter that attacked the senator …) and object relative clauses (OR: The reporter that the senator attacked …), an empirical robust finding in many languages through experimental paradigms (see 4.3.2.3), proceeds primary from the fact that ORs are uncommon compared to SRs (Roland et al., 2007). However, an important number of studies found mixed evidence supporting both memory and expectation-based approaches, thus suggesting that both factors might be playing a role during processing (Staub, 2010).

4.3.2.1

Island effects

One of the most studied phenomena regarding locality are “movement islands”, in other words syntactic environments that block extraction, as in (2b). 2a We met the mathematician who solved the puzzle. 2b *This is the puzzle that we met the mathematician who solved __. Island constraints are grammatically imposed restrictions preventing any constituents from escaping from these syntactic environments. It is worth noting that it is the very nature of the structures, not the mere linear distance between the elements of the dependency that determines the character of the island. Since Ross (1967), island constraints have played a major role in the development of syntactic theory (Rizzi, 1990; Sprouse & Hornstein, 2013) and many studies have sought to determine whether it is either possible or appropriate to give a unified account of all type of islands (wh-, complex NP, subject, and adjunct; Chomsky, 2001; Phillips, 2006). The sensitivity of the ­ parser to islands has been investigated using different types of islands and experimental measures. Experimental research attests that island effects are detectable both in offline judgments and during online processing. Unlike the classic offline acceptability judgments (see Section 4.4.1), several studies have investigated whether real-time structure building respects island constraints. For example, an ERP study (see Section 4.4.4) by McKinnon and Osterhout (1996) showed that when readers enter an island domain while sharing incomplete WH-dependency, a P600 brain response, characteristic of syntactic abnormalities, is elicited. Explanations for this phenomenon are often 55

Yamila Sevilla and María Elina Sánchez

divided between grammatical theories and processing-based accounts (Kluender & Kutas, 1993; Wagers & Phillips, 2009).

4.3.2.2 Attraction effects Another important line of research in experimental psycholinguistics thoroughly studied the influence of distance and intervention in local dependencies through a kind of grammatical illusion called the attraction effect. In the original Bock and Miller’s (1991) elicited production paradigm, speakers were given a stimulus like (3) and were asked to produce a sentence with a conjugated verbal form: 3 The key of the cabinets BE on the table. Speakers tended to produce more agreement errors (such as The key of the cabinets ARE on the table) when a DP with different agreement values appeared between the head noun of the subject and the verb. The features of the intervenient noun thus attracted the values of the features on the verb. From then on, a prodigious number of studies have empirically addressed agreement in different languages by analyzing these errors in both language production (e.g., Vigliocco, & Hartsuiker, 2002) and comprehension (e.g., Lago et al., 2015; Wagers, Lau & Phillips, 2009). Studies have focused primarily on subject-verb agreement, but there are also reports of this phenomenon in subject-predicate gender agreement and pronoun-antecedent gender and number agreement. Other dependencies such as reflexives or variable-bound pronouns seem, however, to be more resistant to attraction. That is, when the agreement involves these types of elements, attraction errors are not generated (see Parker and Phillips, 2017 for a discussion on the different patterns of interference among dependency types). Sentence production models offer various explanations for this phenomenon. Certain models assume that it occurs during the grammatical encoding of the subject noun phrase (Vigliocco & Hartsuiker, 2002), while others propose that it occurs during the copying or retrieval of the appropriate agreement values at the verb or adjective (Badecker & Kuminiak, 2007). In comprehension, two classes of theories explain the results related to attraction agreement: those that attribute attraction effects to a failed number representation of the subject and others that consider that attraction effects emerge in the process of re-accessing the subject number when the verb is encountered (Wagers et al., 2009). Memory-based approaches have also explained attraction effects during sentence processing. For these, the similarity of the features of the elements involved in the agreement, such as gender or number, in non-target positions interferes with the retrieval of the target, modulating acceptability, reading times and production. That is, the greater the similarity of the elements, the more difficult it is, therefore, more errors or longer latencies or reading times are evident. Given the mentioned selectivity, a series of core questions arise as we see attraction errors as a case of similarity-based interference or intervention: What counts as an intervener? Do all syntactic positions equally trigger interference? Why are some features more prone to attraction than others? (Some empirical answers are further explored in Dillon et al., 2013; Tucker et al., 2015).

4.3.2.3 ­Long-distance ​­ dependencies Finally, a significant amount of research has focused on long-distance dependencies, which are considered a defining feature of human language. A long-distance or filler-gap dependency (LDD) 56

Experimental syntax

is a relation between an element and a syntactically licensed position, or gap, in an embedded clause, such as in questions or in relativization, as in (4) and (5). 4 The reporter that attacked the senator admitted the error. 5 The reporter that the senator attacked admitted the error. The production and comprehension of sentences containing relative clauses was extensively studied in different languages and populations, using many different tasks and techniques, including both online and offline measures. Much of the work carried out to date documents an advantage for Subject Relatives Clauses (SRs), such as (4), over Object Relatives Clauses (ORs), such as (5). This is evidenced in a series of effects that are reproduced throughout the different studies: ORs give rise to a greater number of errors and longer latencies in comprehension tasks (Gordon et al., 2004). They elicit avoidance strategies in production tasks (Belletti & Contemori, 2010). They are acquired later (Friedmann et al., 2009) and are particularly difficult to understand and to produce for people with aphasia (Friedmann, 2008), L2 learners (Cunnings & Fujita, 2023) and for older adults (Liu & Wang, 2019). Moreover, the asymmetry between SRs and ORs has been studied in a wide array of languages, including English, German, Dutch, French, Spanish, Korean, Mandarin and Basque. The asymmetry, however, might not be universal but dependent on the properties of the languages’ grammars and on the structural configurations of the dependencies in question (see Carreiras et al., 2010 for a discussion). Again, many different accounts have been proposed to explain the ORs disadvantage. The active-filler strategy (Clifton & Frazier, 1989) explains the difficulty as a result of the reanalysis of an initial interpretation. Expectation-based models attribute the asymmetry to the fact that ORs are less frequent than SRs. Different studies in this line of research have sought to determine which factors associated with experience influence the difficulty in processing ORs (Gennari & MacDonald, 2008; Reali & Christiansen, 2007; Wells et al., 2009). On a different perspective, memory-based accounts (Gibson, 2000) suggest that in ORs, unlike what happens in SRs, the sentence subject needs to be maintained in memory until it is reactivated in the position in which it is interpreted in relation to the nonadjacent-embedded verb. A related view, cue-based models (Lewis & Vasishth, 2005) attribute the difficulty to interference during retrieval, due to the intervention of a similar item (the subject) between filler and gap. A compatible explanation arises from the Relativized Minimality approach (Belletti & Rizzi, 2013). This perspective enables a fine-grained examination of the nature of the features that generate intervention (interference) effects. A series of works, using various comprehension and production tasks in different languages and populations, investigated the role of features like gender, number and case in the difficulty of processing (Belletti et al., 2012; Friedmann et al., 2009; Friedmann et al., 2017). However, it is possible that the observed pattern derives from the overlapping of the demands that affect the processing and that converge in ORs. In other words, that memory limitations are responsible for difficulties emerging from distance and interference while predictive processes underlying parsing accounts are responsible for the cost of a less preferred structure. Using eye tracking, Staub (2010) showed that it was possible to separate the time course of the contribution of memory and expectancy effects, although both played a role in explaining the difficulty of ORs. Their results suggest that the violation of expectations and the load of memory retrieval have distinct consequences in reading, with memory load affecting integration at the relative clause verb, and expectation violation taxing the processing at the subject of the relative clause. This tour through some of the central topics of psycho- and neurolinguistics offered an overview of how experimental data have been used in the building of our knowledge about syntax, that 57

Yamila Sevilla and María Elina Sánchez

is, not only to validate formal hypotheses within grammatical theories but also to illuminate the comprehension of how syntax works in our cognitive system.

4.4

Main research methods

Experimental data on syntax come from a wide variety of sources, methodologies and techniques. Due to space limitations, we concentrate only on some types of evidence that have been fundamental in our knowledge of how grammar works and how it is represented in the mind and brain.

4.4.1 Acceptability judgments Experimental evidence for syntactic theory has come largely from offline tasks. Acceptability judgments are, without a doubt, the quintessential source of this type. Experiments using acceptability judgments generally involve native speakers of a language deciding whether a sentence is acceptable or not. Participants either rate a sentence on a scale (e.g. Likert or Magnitude) or they choose the best alternative. This type of study has been very prolific in the literature on experimental syntax as it is supposed to inform about internalized knowledge in speakers, although this relationship is far from transparent. According to Sprouse et al. (2013), grammaticality judgments provide language researchers with three pieces of useful information to test their hypotheses: (1) the presence or not of an effect given a certain manipulation, (2) the magnitude of the effect (i.e., the impact of the manipulation), and (3) the localization of the conditions on the acceptability scale. While acceptability judgments seek to reveal information about the grammaticality of a sentence, the construction of the items must take into account other factors that may influence acceptability, such as – among others – plausibility, lexical or structural frequency and/or working memory capacities. Given the strong presence of acceptability judgments in the field of experimental syntax, and also because of the criticisms they have received (in particular, on reliability of informal judgments: i.e., Gibson & Fedorenko, 2013; but Schütze & Sprouse, 2013), recent decades have seen a number of advances in the use of formal experimental methods, including factorial design, for collecting judgments of acceptability (Myers, 2017). Kluender and Kutas (1993) suggest that in studies on the islands effect with acceptability judgments there are at least two factors to quantify: the long-distance dependence and the structure of the island. Sprouse and Villata (2021) suggests that factorial logic can be used to isolate these two effects with two factors: DEPENDENCE LENGTH (short and long) and STRUCTURE (island and non-island). With two factors, each with two levels, we have a 2×2 design which produces four conditions as in example (6 a–d): 6­

a b c d

Who __ thinks that Mary wrote a book? (short | non-island) What do you think that Mary wrote __? (long | non-island) Who __ thinks that Mary wrote a book? (short | island) What do you wonder whether Mary wrote __? (long | island)

Among the extensive series of investigated phenomena, the studies related to the island violations in different languages can be highlighted (Sprouse et al., 2013). Structural and working memory capacity factors have been taken into account for the explanation of the results (Kluender & Kutas, 1993). Using self-paced reading, Hofmeister and Sag (2010) showed that island constraints are 58

Experimental syntax

not purely syntactic or categorical but are actually due to working memory limitations. That is, modifying the distance that separates the gap from the antecedent, without modifying the island’s structures, changes the acceptability rates. At a greater distance and with a non-structural modification of the sentences, the acceptability decreases and vice versa.

4.4.2 ­Eye-tracking ​­ Monitoring eye movements while language users carry out sentence processing tasks in written or oral form has proven to be informative about how our cognitive system parses and plans sentences and, indirectly, how grammatical constraints operate throughout those processes. The technique is relatively straightforward: a camera records the movements of the participant’s eyes as they look at a screen where visual stimuli (written sentences, pictures, or videos) are presented. Data analysis then permits to determine the exact position of the pupil at each precise moment during a task, hence allowing to identify when participants are fixating on a particular element, for how long and what do they do as they encounter difficulties. Two experimental paradigms are widely used: tracking the fixation history during reading and tracking eye’s trajectories and object fixations during picture inspection (also known as “visual world paradigm”). Most studies have been carried out in the reading domain and have provided evidence complementary to reading times in self-paced reading and response latencies in comprehension tasks. For example, since Frazier and Rayner’s (1982) classic work on ambiguous sentences, eye-tracking measures, like fixation duration, regression probability or re-reading time, have been considered to index parsing events, namely when the parser experiences overload and what it does as it encounters a phrase, a word (or a fragment) hard to process or integrate. Although eye-tracking data are primarily tied to reading processes and online comprehension mechanisms, and to a much lesser extent to theoretical distinctions on grammar, sentence comprehension research has used this technique to discuss some fine-grained linguistic phenomena in morphosyntactic domain, such as transitivity and subcategorization biases (Staub et al., 2006), word order (Sauppe et al., 2013), information structure (Paterson et al., 2007), ergativity (Arantzeta et al., 2017), scrambling (Tamaoka et al., 2014), and case marking (Traxler & Pickering, 1996) among many others. The wide range of phenomena accompanies the growing diversity of languages studied. Regarding sentence comprehension, it was used to investigate many of the issues we discussed above, including dependency resolution (Sussman & Sedivy, 2003), reanalysis and repair processes (Frazier & Rayner, 1982), reference resolution (Sturt, 2003), predictive processes (Staub et al., 2010) and word order and argument linking (Gattei et al., 2017) among others. In addition, sentence processing in L2 learners and bilinguals (Cunnings & Fujita, 2021) and comprehension difficulties in aphasia (Hanne et al., 2011) have also been investigated. Studies monitoring eye movements aimed at investigating the production of sentences are much scarcer. Since Griffin and Bock’s (2000) seminal work on the time-course of sentence formulation, eye-tracking measures have been used to understand how speakers map conceptual representations onto linguistic structures and to unpack different aspects of planning, such as the nature and size of the planning unit and how lexical and syntactic processes articulate (Meyer, 2004). A wealth of typologically distinct languages is represented in this field, including less explored languages (see Norcliffe & Konopka, 2015 for a review). Cross-linguistic research has focused on understanding the extent to which the planning processes involved in sentence production conform to the grammatical properties of specific languages (Egurtzegi et al., 2022; Maldonado et al., 2013; Norcliffe et al., 2015). 59

Yamila Sevilla and María Elina Sánchez

4.4.3

Pupillometry

Together with eye-tracking, pupillometry has been used to investigate the processing cost associated with sentence comprehension and production, as pupil size is considered a psychophysiological index of arousal and cognitive effort. Pupillometry exploits task-evoked responses of the pupil (TEPRs), or changes in pupil dilation measured relative to some baseline, to infer the intensity of processing and hence the difficulty experienced in solving a task in different cognitive domains (Beatty & Lucero-Wagoner, 2000), including different aspects of language (see Schmidtke, 2018 for a review). At the sentence level, since Schluroff’s (1982) seminal study, a handful of works has shown that TERPs correlate with the grammatical complexity of the sentences. In this study, participants listened to a wide range of different types of sentences that varied in length, syntactic construction and content. While participants’ subjective ratings of sentence difficulty were related to sentence length, mean pupillary dilation significantly correlated with syntactic complexity, attesting the sensitivity of the measure, and suggesting a close and fine-grained relation between structural properties of utterances and computational demands in online processing. Just and Carpenter (1993) also used pupillary dilation responses as an indicator of cognitive load imposed by syntactic structure in sentence comprehension during reading, and found that complex sentences elicited larger pupillary responses than their simple counterparts. Also in the sentence comprehension domain, pupil size changes over time were used to investigate constituent order (Scheepers & Crocker, 2004) and the influence of prosody and visual context on processing effort associated with the resolution of temporary syntactic ambiguities (Engelhardt et al., 2010). As for sentence production, results support the relative independence of lexical and syntactic processes during planning and confirm the well-established finding regarding word order and syntactic structure: non-canonical sentences are more difficult to process than canonical sentences ones (Schluroff et al., 1986; Sevilla et al., 2014). Pupillometric studies focusing on different populations have shown that effort is increased in non-native comprehenders (Borghini & Hazan, 2018) and when language processing is demanding due to conditions such as aphasia, cognitive aging, or other factors external to the task (see for example Chapman & Hallowell, 2015; Demberg & Sayeed, 2016).

4.4.4

Electrophysiological methods

Electrophysiological techniques allow direct recording of brain activity. The electroencephalogram (EEG) registers the bioelectrical signals of the brain by placing the electrodes directly on the subject’s scalp in different areas of the skull. Another electrophysiological non-invasive technique, magnetoencephalography (MEG), is an extremely sensitive method that manages to capture very small electrical currents inside the neurons. Studies using these techniques perform amplification and averaging of the amplified signal to extract brain components known as event-related potentials (ERPs). Different components or potentials have been identified that relate the time or moment in which the response appears and the lateralization and brain region where the response occurs to the onset of a specific kind of stimulus. An advantage of this technique is that it allows the investigation of very fast and automatic processes, such as those of syntactic computation. While behavioral measures often require responses that depend on conscious decisions (for example, whether a sentence is grammatical), recording electrophysiological responses constitutes a fairly direct measure of those processes. The literature in this field has made frequent use of the anomaly detection paradigm. The rationale is that the brain automatically detects a linguistic inconsistency, and that the response will 60

Experimental syntax

be modulated by the magnitude but above all by the type of anomaly. Three central ERPs are mainly investigated as syntactic processing correlates: the early left anterior negativity (ELAN), the left anterior negativity (LAN), and the P600. ELAN is a component or potential that arises between 100 and 200 milliseconds (ms) after the presentation of the stimulus and is associated with the computation of categorical and phrase structure information. This potential appears, crosslinguistically, when differentiating nouns and verbs and/or open and closed class words (Hahne & Friederici, 1999). Another negative and early component is the LAN, which has its peak between 300 and 500 ms. Many studies have shown that this component appears in different violations of morphosyntactic type. The potential that is strictly associated with syntactic abnormalities and processing of syntactically complex or less frequent structures is the P600 (Osterhout & Holcomb, 1992). P600 occurs between 500 and 800 ms post-stimulus in the centroparietal sites. This component was also reported in studies of different types of syntactic violations such as sentence structure violations (Hahne & Friederici, 1999), agreement errors between the subject and the verb or gender and/or or number between some elements (Kaan, 2002), violation of islands (McKinnon & Osterhout, 1996), and argument structure (Kuperberg, 2007), as well as in garden path sentences (Kaan & Swaab, 2002) or complex structures with syntactic movement (Phillips et al., 2005). Another ERP that has been studied extensively and for which there is broad consensus is the N400, which, although not strictly syntactic, is involved in sentence processing. The N400 is a potential that is also generated between 300 and 500 ms, but in central-parietal locations, and is associated with the detection of semantic violations in response to written or heard stimuli. For example, an N400 will appear when comparing a semantically congruent sentence (The man washes his hands with soap and water) with an incongruous one (The man washes his hands with soap and TELEPHONE). Friederici (2002) proposed a model that integrates some of these components in accounting for sentence comprehension. According to this model, between 200 and 400 ms, the initial syntactic structure of the sentence is built and if an anomality occurs in this step, the ELAN potential is triggered. Then, between 300 and 600 ms, the lexical and semantic processing is carried out, and when they are interrupted, a component N400 appears. Finally, the integration of syntactic and semantic analysis occurs between 500 and 800 ms, at which point a potential P600 may arise upon detection of a syntactic violation. However, some evidence suggests that these components could also be modulated by predictive processes (for a review, see Van Petten & Luka, 2012).

4.4.5

Hemodynamic methods

While performing a cognitive activity, neural activity increases. This increase involves a demand for extra glucose and oxygen in the region of the brain that is activated, and therefore, of blood flow. Positron emission tomography (PET) and functional magnetic resonance imaging (fMRI) allow the fluctuations that occur in blood flow to be recorded and reconstructed in illuminated images of brain regions. Since they are (indirect) hemodynamic reflections of the underlying neural activity, these techniques have very good spatial resolution, but poor temporal resolution because blood travels much more slowly than electricity. Hence, they are mainly used to investigate the specific location of processing of different components of language in the brain and not the timing of such processes. Broca’s area was the first brain region to be linked to language and has since been related to different aspects of phonological and grammatical processing (Embick et al., 2000; Friederici et al., 2000). Regarding syntax, while different phenomena have been investigated, many of the works 61

Yamila Sevilla and María Elina Sánchez

using hemodynamic techniques have concentrated on identifying differential brain activity in Broca’s area in relation to the processing of non-canonical sentences (sentences that deviate from the canonical constituent order of a language; typically, passive sentences that revert the Agent-Verb-Theme order in SVO languages). This effect, known as the canonicity effect, has been reported in reference to ORs and SRs, sentences with topicalization and cleft object sentences (Caplan et al., 1999), as well as wh- questions (Ben-Shachar, et al., 2004; Santi & Grodzinsky, 2010), and scrambling (Bornkessel-Schlesewsky ­­ ​­ et  al., 2009). The increase in the BOLD (blood-oxygen-level-dependent) ­­ ­​­­ ­​­­ ​­ signal has been attributed to different syntactic mechanisms and functional processes within the framework of different theoretical models: syntactic movement (Santi & Grodzinsky, 2010), unification (Hagoort, 2005; Snijders et al., 2009), linearization (Bornkessel-Schlesewsky et al., 2009) and general or specific working memory mechanisms (Caplan et al., 1999; Fedorenko et al., 2007). Taken together, hemodynamic studies indicate an intervention of Broca’s area in relatively dissimilar aspects of processing, such as phonological processing, word grouping, morphosyntactic operations and working memory. Therefore, it seems clear that Broca’s area does not process only grammar and that not all grammar is processed in this area (Rogalsky et al., 2008; Rogalsky & Hickok, 2011). While most research on neuroimaging studies of syntactic processing has focused on Broca’s area, in recent years studies have started linking parts of the temporal lobes (previously linked to lexical processing) with syntax (Pallier et al., 2011). Anterior portions of temporal cortex show greater bilateral activation when participants read or listen to syntactic structures than when they process lists of words (Humphries et al., 2006; Stowe et al., 1998). The posterior portions of the middle temporal (pMTG) lobe have also received attention in the literature. Different studies with healthy people and people with left brain lesions have shown that, in addition to being involved in the processing of single words, pMTG could also support the computation of the syntactic hierarchy (Dronkers et al., 2004; Fridriksson et al., 2018), that is, structure building during comprehension and production processing (but not to the linearization and morphosyntactic processing mechanisms supported by frontal regions). In short, the language areas profile that emerges from the literature consistently indicates that there are quite specific regions that are highly sensitive to grammatical properties and hierarchical structure (Embick et al., 2000; Matchin & Hickok, 2020; Pallier et al., 2011 for a review). Consensus is less strong regarding the syntactic operations that build structural representations. While some authors adopt what was called “the syntactotopic conjecture” and attempted to localize the formal mechanisms proposed by theory (like Merge or Move) in certain brains areas, mainly Broca’s region (Grodzinsky & Amunts, 2006; Santi & Grodzinsky, 2010), others argue that these areas are dedicated not to grammatical operations per se, but rather to the processing mechanisms linked to them, including working memory (Rogalsky & Hickok, 2011; Stowe et al., 1998). Finally, cross-linguistic research reveals that the areas involved cannot be identified or restricted to extensive areas, such as Broca’s or Wernicke’s. Instead, neuroimaging data seems to favor a more distributed and fine-grained network that houses functionally differentiated aspects of language processing.

4.4.6

Neuropsychological data (people with aphasia)

Another way to explore how the language processing system works and how it is represented in the brain is to study system breakdowns, that is, language deficits. For instance, studies on aphasia, acquired language deficits resulting from a brain injury, provide us with relevant information. Related to syntax, sentence-processing difficulty in People with Aphasia (PWA) has traditionally been referred to as agrammatism. This clinical phenomenon has aroused much interest because 62

Experimental syntax

its definition encompasses a set of very diverse difficulties in language production, comprehension, or both, that seems difficult to account through a single functional explanation. Hence, it has generated profuse research that involves both theorists and clinicians, who try to understand how syntax is processed from different perspectives and knowledge. Agrammatism is usually related to anterior lesions of the left hemisphere and has classically been associated with a particular type of non-fluent clinical profile, Broca’s aphasia. For this reason, agrammatic aphasia has largely been used to investigate the role of Broca’s area in language processing. However, in recent years, other types of aphasia such as Wernicke’s or conduction aphasia, both fluent aphasias, have also been incorporated into the discussion to shed light on the functional organization of syntax in the brain. Although production is the most salient feature in agrammatic aphasia, since Caramazza and Zurif’s (1976) seminal work, particular problems in sentence comprehension have also been found. Studies in production highlight difficulties in some linguistic phenomena and report differential patterns. For example, inadequate production of closed-class words (Goodglass et al., 1972) and inflectional morphology, especially the dissociation between production of tense (impaired) and agreement (intact) (Friedmann & Grodzinsky, 1997); preference for producing sentences in canonical order (Bates et al., 1988); and deficits in production of verbs with complex argument structure (Thompson, 2003). Regarding sentence comprehension, the classic pattern is a deficit with sentences in non-canonical word order (Caramazza & Zurif, 1976). In general, good comprehension of active sentences has been observed compared to poor comprehension of passive ​­ sentences. Deficits have also been observed in comprehension of object-extracted wh-phrases, of sentences with ORs, and pronouns (Friedmann, 2008). Different approaches have been proposed to account for this diversity of symptoms. In general, the perspectives can be grouped into two theories. On the one hand, the representational theories attribute the deficit to the alteration of some representational component of syntactic processing and were formulated mostly within the framework of the successive versions of Generative Grammar (e.g., Tree Pruning Hypothesis (Friedmann & Grodzinsky, 1997), Trace Deletion Hypothesis (Grodzinsky, 1995 and its revisions) or Argument Structure Complexity Hypothesis (Thompson, 2003)). On the other hand, the processing theory’s hypotheses explain these phenomena by different affected mechanisms: memory limitations or temporal restrictions (Kolk, 1995), slowed prediction of syntactic dependencies (Zurif et al., 1993), or working memory deficits (Just & Carpenter, 1992; Pettigrew & Hillis, 2014). More recently, however, proposals have emerged that connect both types of explanations (Garraffa & Grillo, 2008), integrating reduced processing resources into a framework of well-defined syntactic properties and operations. Among studies linking agrammatic comprehension to brain injury, some large-scale lesiondeficit mapping studies have reported an association between damage to Broca’s area and comprehension of non-canonical sentence structures (Fridriksson et al., 2018; Grodzinsky & Amunts, 2006; Matchin & Rogalsky, 2017). However, other studies point to important participation of the posterior temporal lobe (i.e., Dronkers et al., 2004).

4.5

Future directions

Research on sentence comprehension and production has historically followed quite dissociated paths. Experimental research on parsing processes has been by far the most fruitful and directly connected to the construction and discussion of syntactic theory. Although one obvious reason for this is that the mechanisms involved in generation are more difficult to access for experimental inquiry, it is also a fact that sentence production studies focused more on processes than on representations. A notable exception is perhaps the studies of the phenomenon of syntactic priming, 63

Yamila Sevilla and María Elina Sánchez

which for reasons of space we have not included in the last section of this chapter. A complete review of the findings and a theoretical proposal can be found in Branigan and Pickering (2017). Connected to this point, a major issue is the relationship between parsing and generation. Although they have been studied separately, it is natural to think that both share, at least, the representations with which they work and, possibly, many processes and their neural substantiations, except obviously those that correspond to the specific aspects of the two modalities. Attempts to propose a unified structure-building model for comprehension and production take two approaches: the interactionist perspective (Pickering & Garrod, 2013) and the single-mechanism account (Kempen, 2014; Momma & Phillips, 2018). As Momma and Phillips (2018) remark, the way in which this relationship is conceived is central to another, broader but fundamental issue, that of achieving an explicit theory that links theories of grammar to behavior and the brain. As we pointed out, in recent decades, experimental studies have benefited from cross-linguistic research with the incorporation of works on non-European languages, especially Oriental and Semitic languages. There remains, however, a debt with respect to the study of less explored languages, probably because their speakers, mostly from disadvantaged populations, have less or no access to academia and laboratories. Another area in which democratization is even less than incipient is in access to different techniques, especially neuroimaging.

Acknowledgements Partially supported by Universidad de Buenos Aires, Grant UBACyT 20020190100187BA, and by Consejo Nacional de Investigaciones Científicas y Técnicas, Grant PIP 11220200101289CO.

Further reading Embick, D., & Poeppel, D. (2006). Mapping syntax using imaging: problems and prospects for the study of neurolinguistic computation. In K. Brown (Ed.), Encyclopedia of Language & Linguistics (Second Edition) (pp. Elsevier. ­­  ­484–486). ​­ Lewis, S., & Phillips, C. (2015). Aligning grammatical theories and language processing models. Journal of ­ 27–46. ­ ​­ Psycholinguistic Research, 44(1), Matchin, W., & Hickok, G. (2020). The cortical organization of syntax. Cerebral Cortex, 30(3), ­ 1481–1498. ­ ​­ Schindler, S., Drożdżowicz, A., & Brøcker, K. (Eds.). (2020). A user’s view of the validity of acceptability judgments as evidence for syntactic theories. In Linguistic Intuitions: Evidence and Method (pp. ­­  ­215–​ 232). Oxford University Press.

Related topics Analyzing spoken language comprehension with eye-tracking; analyzing language comprehension using ERP; analyzing language using brain imaging; new directions in statistical analysis for experimental linguistics; experimental methods to study disorders of language production in adults; experimental methods to study bilinguals.

References Aguilar, M., & Grillo, N. (2021). Spanish is not different: On the universality of minimal structure and locality principles. Glossa: A Journal of General Linguistics, 6(1), ­ ­1–22. ​­ Arantzeta, M., Bastiaanse, R., Burchert, F., Wieling, M., Martinez-Zabaleta, M., & Laka, I. (2017). Eyetracking the effect of word order in sentence comprehension in aphasia: Evidence from Basque, a free word order ergative language. Language, Cognition and Neuroscience, 32(10), ­ ­1320–1343. ​­

64

Experimental syntax Baccino, T., De Vincenzi, M., & Job, R. (2000). Cross-linguistic studies of the late closure strategy: French and Italian. In M. De Vincenzi, & V. Lombardo (Eds.), Cross-Linguistic Perspectives on Language Processing. Studies in Theoretical Psycholinguistic (pp. ­­  ­89–118). ​­ Springer, Dordrecht. Badecker, W., & Kuminiak, F. (2007). Morphology, agreement and working memory retrieval in sentence ­ ­65–85. ​­ production: Evidence from gender and case in Slovak. Journal of Memory and Language, 56(1), Bates, E.A., Friederici, A.D., Wulfeck, B.B., & Juarez, L.A. (1988). On the preservation of word order in ​­ ­ ­323–364. ​­ aphasia: ­Cross-linguistic evidence. Brain and Language, 33(2), Beatty, J., & Lucero-Wagoner, B. (2000). The pupillary system. Handbook of Psychophysiology, 2, 142–162. ­ ​­ Belletti, A., & Contemori, C. (2010). Intervention and attraction. On the production of subject and object relatives by Italian (young) children and adults. In J. Costa, A. Castro, M. Lobo & F. Pratas (Eds.), Language Acquisition and Development (pp. ­­  ­39–52). ​­ Cambridge Scholars Publishing. Belletti, Adriana, et al. (2012). Does gender make a difference? Comparing the effect of gender on children’s comprehension of relative clauses in Hebrew and Italian. Lingua, 122(10): ­ ­1053–1069. ​­ Belletti, A., & Rizzi, L. (2013). Intervention in grammar and processing. In I. Caponigro & C. Cecchetto (Eds.) ­ From Grammar to Meaning: The Spontaneous Logicality of Language (pp. ­­  ­294–311). ​­ Cambridge: Cambridge University Press. Ben-Shachar, M., Palti, D., & Grodzinsky, Y. (2004). Neural correlates of syntactic movement: Converging ­ ­1320–1336. ​­ evidence from two fMRI experiments. Neuroimage, 21(4), Berko Gleason, J. & Bernstein Ratner, N. (1993). Psycholinguistics. Wadworth Publishing. Bock, K., & Miller, C.A. (1991). Broken agreement. Cognitive Psychology, 23(1), ­ ­45–93. ​­ Borghini, G., & Hazan, V. (2018). Listening effort during sentence processing is increased for non-native listeners: A pupillometry study. Frontiers in Neuroscience, 12, 152. Bornkessel, I., & Schlesewsky, M. (2006). The extended argument dependency model: A neurocognitive approach to sentence comprehension across languages. Psychological Review, 113(4), ­ 787. Bornkessel-Schlesewsky, I., Schlesewsky, M., & von Cramon, D.Y. (2009). Word order and Broca’s region: ­ ­125–139. ​­ Evidence for a supra-syntactic perspective. Brain and Language, 111(3), Branigan, H.P., & Pickering, M.J. (2017). An experimental approach to linguistic representation. Behavioral and Brain Sciences, 40, e282. Bresnan, J. (2001). Explaining morphosyntactic competition. In M. Baltin and C. Collins (Eds.), Handbook of Contemporary Syntactic Theory (pp. Blackwell Publishers Ltd. ­­  ­11–44). ​­ Caplan, D., & Waters, G. (2013). Memory mechanisms supporting syntactic comprehension. Psychonomic Bulletin & Review, 20(2), ­ ­243–268. ​­ Caplan, D., Alpert, N., & Waters, G. (1999). PET studies of syntactic processing with auditory sentence presentation. NeuroImage, 9(3), ­ ­343–351. ​­ Caramazza, A., & Zurif, E.B. (1976). Dissociation of algorithmic and heuristic processes in language comprehension: Evidence from aphasia. Brain and Language, 3(4), ­ ­572–582. ​­ Carreiras, M., & Clifton Jr, C. (1993). Relative clause interpretation preferences in Spanish and English. Language and Speech, 36(4), ­ ­353–372. ​­ Carreiras, M., & Meseguer, E. (1999). Procesamiento de oraciones ambigüas. In M. De Vega & F. Cuetos (Eds.), Psicolingüística del español (pp. Trotta. ­ ­­  ­163–203). ​­ Carreiras, M., Duñabeitia, J.A., Vergara, M., De La Cruz-Pavía, I., & Laka, I. (2010). Subject relative clauses are not universally easier to process: Evidence from Basque. Cognition, 115(1), ­ 79–92. ­ ​­ Chapman, L.R., & Hallowell, B. (2015). A novel pupillometric method for indexing word difficulty in individuals with and without aphasia. Journal of Speech, Language, and Hearing Research, 58(5), ­ 1508–1520. ­ ​­ Chomsky, N. (1965). Aspects of the Theory of Syntax. MIT Press. Chomsky, N. (2001). Derivation by phase. In M. Kenstowicz (Ed.), Ken Hale: A Life in Language (pp. ­­  ­1–52). ​­ MIT Press. Clifton, C., & Frazier, L. (1989). Comprehending sentences with long-distance dependencies. In G. Carlson & M. Tanenhaus (Eds.), Linguistic Structure in Language Processing (pp. ­­  ­273–317). ​­ Kluwer Academic Publishers. Cuetos, F., & Mitchell, D.C. (1988). Cross-linguistic differences in parsing: Restrictions on the use of the Late Closure strategy in Spanish. Cognition, 30(1), ­ ­73–105. ​­ Culicover, P.W., & Jackendoff, R. (2005). Simpler Syntax. Oxford University Press. Cunnings, I., & Fujita, H. (2021). Similarity-based interference and relative clauses in second language processing. Second Language Research, 37, ­178–190. ​­

65

Yamila Sevilla and María Elina Sánchez Cunnings, I., & Fujita, H. (2023). Similarity-based interference and relative clauses in second language processing. Second Language Research, 39(2), ­ 539–563. ­ ​­ Demberg, V., & Sayeed, A. (2016). The frequency of rapid pupil dilations as a measure of linguistic processing difficulty. PloS one, 11(1), ­ p.e0146194. Dillon, B., Mishler, A., Sloggett, S., & Phillips, C. (2013). Contrasting intrusion profiles for agreement and anaphora: Experimental and modeling evidence. Journal of Memory and Language, 69, 85–103. ­ ​­ Dronkers, N.F., Wilkins, D.P., Van Valin Jr, R.D., Redfern, B.B., & Jaeger, J.J. (2004). Lesion analysis of the brain areas involved in language comprehension. Cognition, 92(1–2), ­­ ​­ 145–177. ­ ​­ Egurtzegi, A., Blasi, D.E., Bornkessel-Schlesewsky, I., Laka, I., Meyer, M., Bickel, B., & Sauppe, S. (2022). Cross-linguistic differences in case marking shape neural power dynamics and gaze behavior during sentence planning. Brain and Language, 230, 105127. Embick, D., Marantz, A., Miyashita, Y., O’Neil, W., & Sakai, K.L. (2000). A syntactic specialization for Broca’s area. Proceedings of the National Academy of Sciences, 97(11), 6150–6154. ­ ­ ​­ Engelhardt, P.E., Ferreira, F., & Patsenko, E.G. (2010). Pupillometry reveals processing load during spoken language comprehension. Quarterly Journal of Experimental Psychology, 63(4), ­ 639–645. ­ ​­ Fedorenko, E., Gibson, E., & Rohde, D. (2007). The nature of working memory in linguistic, arithmetic and spatial integration processes. Journal of Memory and Language, 56(2), ­ 246–269. ­ ​­ Fodor, J., Bever, A., & Garrett, T.G. (1974). The Psychology of Language: An Introduction to Psycholinguistics and Generative Grammar. New York: McGraw-Hill. ­ ​­ Frazier, L. (1978). On comprehending sentences: Syntactic parsing strategies. Unpublished Doctoral Dissertation. Storrs. Frazier, L., & Fodor, J.D. (1978). The sausage machine: A new two-stage parsing model, Cognition, 6(4), ­ 291–325. ­ ​­ Frazier, L., & Rayner, K. (1982). Making and correcting errors during sentence comprehension: Eye movements in the analysis of structurally ambiguous sentences. Cognitive Psychology, 14(2), ­ 178–210. ­ ​­ Fridriksson, J., den Ouden, D.B., Hillis, A.E., Hickok, G., Rorden, C., Basilakos, A., Yourganov, G., & Bonilha, L. (2018). Anatomy of aphasia revisited. Brain, 141(3), ­ 848–862. ­ ​­ Friederici, A. (2002). Towards a neural basis of auditory sentence processing. Trends in Cognitive Sciences, 6, 78–84. ­ ​­ Friederici, A.D., Meyer, M., & Von Cramon, D.Y. (2000). Auditory language comprehension: An event-related fMRI study on the processing of syntactic and lexical information. Brain and Language, 74(2), ­ 289–300. ­ ​­ Friedmann, N. (2008). Traceless relatives: Agrammatic comprehension of relative clauses with resumptive pronouns. Journal of Neurolinguistics, 21(2), ­ 38–149. ­ ​­ Friedmann, N., Belletti, A., & Rizzi, L. (2009). Relativized relatives: Types of intervention in the acquisition of A-bar dependencies. Lingua, 119(1), ­ ​­ ­ 67–88. ­ ​­ Friedmann N, Grodzinsky Y. (1997). Tense and agreement in agrammatic production: pruning the syntactic tree. Brain and Language, 56(3), ­ 397–425. ­ ​­ Friedmann, N., & Rizzi, L., & Belletti, A. (2017). No case for Case in locality: Case does not help interpretation when intervention blocks A-bar chains. Glossa: A Journal of General Linguistics, 2(1), ­ 33. Garraffa, M., & Grillo, N. (2008). Canonicity effects as grammatical phenomena. Journal of Neurolinguistics, 21(2), ­ 77–197. ­ ​­ Gattei, C.A., Sevilla, Y., Tabullo, A., Wainselboim, A., París, L., & Shalom, D. (2017). Prominence in Spanish sentence comprehension: An eye-tracking study. Language and. Cognitive Neuroscience, 33, 587–607. ­ ​­ ­ ​­ Gennari, S.P., & MacDonald, M.C. (2008). Semantic indeterminacy in object relative clauses. Journal of Memory and Language, 58(2), ­ 161–187. ­ ​­ Gibson, E. (2000). The dependency locality theory: A distance-based theory of linguistic complexity. Image, Language, Brain, 2000, 95–126. ­ ​­ Gibson, E., & Fedorenko, E. (2013). The need for quantitative methods in syntax and semantics research. Language and Cognitive Processes, 28(1–2), ­­ ​­ 88–124. ­ ​­ Goodglass, H., Gleason, J.B., Bernholtz, N.A., & Hyde, M.R. (1972). Some linguistic structures in the speech of a Broca’s aphasic. Cortex: A Journal Devoted to the Study of the Nervous System and Behavior, 8(2), ­ 191–212. ­ ​­ Gordon, P.C., Hendrick, R., & Johnson, M. (2001). Memory interference during language processing. Journal of Experimental Psychology: Learning, Memory, and Cognition, 27(6), ­ 1411–1423. ­ ​­ Gordon, P.C., Hendrick, R., & Johnson, M. (2004). Effects of noun phrase type on sentence complexity. Journal of Memory and Language, 51(1), ­ 97–114. ­ ​­

66

Experimental syntax Griffin, Z.M., & Bock, K. (2000). What the eyes say about speaking. Psychological Science, 11(4), ­ ­274–279. ​­ Grodzinsky, Y. 1995. Trace deletion, theta-roles, and cognitive strategies, Brain & Language, 51, ­467–497. ​­ Grodzinsky, Y., & Amunts, K. (Eds.). (2006). Broca’s Region. Oxford University Press. Hagoort, P. (2005). On Broca, brain, and binding: A new framework. Trends in Cognitive Sciences, 9(9), ­ ­416–423. ​­ Hahne, A., & Friederici, A.D. (1999). Electrophysiological evidence for two steps in syntactic analysis: Early automatic and late controlled processes. Journal of Cognitive Neuroscience, 11(2), ­ ­194–205. ​­ Hanne, S., Sekerina, I.A., Vasishth, S., Burchert, F., & De Bleser, R. (2011). Chance in agrammatic sentence comprehension: What does it really mean? Evidence from eye movements of German agrammatic aphasic ­ ­221–244. ​­ patients. Aphasiology, 25(2), Hofmeister, P., & Sag, I. (2010). Cognitive constraints and island effects, Language, 86(2), ­ ­366–415. ​­ Humphries, C., Binder, J.R., Medler, D.A., & Liebenthal, E. (2006). Syntactic and semantic modulation ­ of neural activity during auditory sentence comprehension. Journal of Cognitive Neuroscience, 18(4), ­665–679. ​­ Just, M.A., & Carpenter, P.A. (1992). A capacity theory of comprehension: Individual differences in working memory. Psychological Review, 99(1), ­ 122. Just, M.A., & Carpenter, P.A. (1993). The intensity dimension of thought: Pupillometric indices of sentence processing. Canadian Journal of Experimental Psychology/Revue canadienne de psychologie expérimentale, 47(2), ­ 310. Kaan, E. (2002). Investigating the effects of distance and number interference in processing subject-verb ­ ­165–193. ​­ dependencies: An ERP study. Journal of Psycholinguistic Research, 31(2), Kaan, E., & Swaab, T.Y. (2002). The brain circuitry of syntactic comprehension. Trends in Cognitive Sci­ ­350–356. ​­ ences, 6(8), Kempen, G. (2014). Prolegomena to a neurocomputational architecture for human grammatical encoding and ​­ decoding. Neuroinformatics, 12, ­111–142. Kluender, R., & Kutas, M. (1993). Subjacency as a processing phenomenon. Language and Cognitive Pro­ ­573–633. ​­ cesses, 8(4), Kolk, H. (1995). A time-based approach to agrammatic production. Brain and Language, 50(3), ­ ­282–303. ​­ Konieczny, L. (2000). Locality and parsing complexity, Journal of Psycholinguistic Research, 29, ­627–645. ​­ Kuperberg, G.R. (2007). Neural mechanisms of language comprehension: Challenges to syntax. Brain Research, 1146, ­23–49. ​­ Lago, S., Shalom, D.E., Sigman, M., Lau, E.F., & Phillips, C. (2015). Agreement attraction in Spanish comprehension. Journal of Memory and Language, 82, ­133–149. ​­ Levelt, W.J.M. (2008). ­ Formal Grammars in Linguistics and Psycholinguistics (Volume 3: Psycholinguistic Applications). John Benjamins Publishing. Levy, R. (2008). Expectation-based syntactic comprehension. Cognition, 106(3), ­ ­1126–1177. ​­ Lewis, R.L., & Vasishth, S. (2005). An activation-based model of sentence processing as skilled memory ­ ­375–419. ​­ retrieval. Cognitive Science, 29(3), Lewis, S., & Phillips, C. (2015). Aligning grammatical theories and language processing models. Journal of Psycholinguistic Research, 44(1), ­ ­27–46. ​­ Liu, X., & Wang, W. (2019). The effect of distance on sentence processing by older adults. Frontiers in Psychology, 10, 2455. MacDonald, M.C., Pearlmutter, N.J., & Seidenberg, M.S. (1994). The lexical nature of syntactic ambiguity resolution. Psychological Review, 101(4), ­ ­676–703. ​­ Maldonado, M. Sevilla, Y., & Shalóm, D.E. (2013). La incrementalidad jerárquica en la codificación gramatical. Un estudio de movimientos oculares. In A. García, V. Orellano de Marra, V. Jaichenco, & A. ­­  ­169–188). ​­ Wainselboim (Eds.), ­ Lenguaje, cognición y cerebro. Sociedad Argentina de Lingüística (pp. ​­ Editorial ­FFyL-UNCuyo y SAL. ­ 1481–1498. ­ ​­ Matchin, W., & Hickok, G. (2020). The cortical organization of syntax. Cerebral Cortex, 30(3), Matchin, W., & Rogalsky, C. (2017). Aphasia & syntax. In J. Sprouse (Ed.), Oxford Handbook of Experimental Syntax. Oxford University Press. McKinnon, R., & Osterhout, L. (1996). Constraints on movement phenomena in sentence processing: Evi­ 495–524. ­ ​­ dence from event-related brain potentials. Language and Cognitive Processes, 11(5), McMahon, L.E. (1965). Grammatical Analysis as Part of Understanding a Sentence. PhD Dissertation, Harvard University.

67

Yamila Sevilla and María Elina Sánchez Meyer, A.S. (2004). The use of eye tracking in studies of sentence generation. In J.M. Henderson & F. Ferreira (Eds.), The Integration of Language, Vision, and Action: Eye Movements and the Visual Words (pp. ­ ­­  ­191–​ 212). Psychology Press. Miller, G.A., & Chomsky, N. (1963). Finitary models of language users. In R.D. Luce, R.R. Bush & E. Gal­­  ­419–491). ​­ anter (Eds.), Handbook of Mathematical Psychology (pp. New York. ­ Momma, S., & Phillips, C. (2018). The relationship between parsing and generation. Annual Review of Lin​­ guistics, 4, ­233–256. Myers, J. (2017). Acceptability judgments. In M. Aronoff (Ed.), Oxford Research Encyclopedia of Linguistics. Oxford University Press. Norcliffe, E., & Konopka, A.E. (2015). Vision and language in cross-linguistic research on sentence produc­­  ­77–96). ​­ tion. In Attention and Vision in Language Processing (pp. Springer. Norcliffe, E., Konopka, A.E., Brown, P., & Levinson, S.C. (2015). Word order affects the time course of sentence formulation in Tzeltal. Language, Cognition and Neuroscience, 30(9), ­ ­1187–1208. ​­ Osterhout, L., & Holcomb, P.J. (1992). Event-related brain potentials elicited by syntactic anomaly. Journal of Memory and Language, 31(6), ­ ­785–806. ​­ Pallier, C., Devauchelle, A.D., & Dehaene, S. (2011). Cortical representation of the constituent structure of ­ 2522–2527. ­ ​­ sentences. Proceedings of the National Academy of Sciences, 108(6), Parker, D., & Phillips, C. (2017). Reflexive attraction in comprehension is selective. Journal of Memory and Language, 94, ­272–290. ​­ Paterson, K.B., Liversedge, S.P., Filik, R., Juhasz, B.J., White, S.J., & Rayner, K. (2007). Focus identification during sentence comprehension: Evidence from eye movements. Quarterly Journal of Experimental ­ ­1423–1445. ​­ Psychology, 60(10), Pettigrew, C., & Hillis, A.E. (2014). Role for memory capacity in sentence comprehension: Evidence from acute stroke. Aphasiology, 28(10), ­ ­1258–1280. ​­ Phillips, C. (2006). The Real-Time Status of Island Phenomena. Language, 82(4), ­ ­795–823. ​­ Phillips, C., Kazanina, N., & Abada, S.H. (2005). ERP effects of the processing of syntactic long-distance dependencies. Cognitive Brain Research, 22(3), ­ 407–428. ­ ​­ Pickering M., & Garrod S. (2013). An integrated theory of language production and comprehension. Behav­ ­329–347. ​­ ioral and Brain Sciences, 36(4), Reali, F., & Christiansen, M.H. (2007). Processing of relative clauses is made easier by frequency of occurrence. Journal of Memory and Language, 57(1), ­ ­1–23. ​­ Rizzi, L. (1990). Relativized Minimality. MIT Press. Rogalsky, C., & Hickok, G. (2011). The role of Broca’s area in sentence comprehension. Journal of Cognitive ­ ­1664–1680 ​­ Neuroscience, 23(7), Rogalsky, C., Matchin, W., & Hickok, G. (2008). Broca’s area, sentence comprehension, and working memory: An fMRI study. Frontiers in Human Neuroscience, 2(14), ­ 1–13. ­ ​­ Roland, D., Dick, F., and Elman, J. L. (2007). Frequency of basic English grammatical structures: A corpus analysis. Journal of Memory and Language, 57, ­348–379. ​­ Ross, J.R. (1967). Constraints on variables in syntax. Ph.D. Thesis, Cambridge: Massachusetts Institute of Technology. Santi, A., & Grodzinsky, Y. (2010). fMRI adaptation dissociates syntactic complexity dimensions. Neuroimage, 51(4), ­ ­1285–1293. ​­ Sanz, M., Laka, I., & Tanenhaus, M. (2013). Sentence comprehension before and after 1970: Topics, debates and techniques. In M. Sanz, I. Laka & M. Tanenhaus (Eds.), Language Down the Garden Path: The Cognitive and Biological Bases for Linguistic Structure (pp. ­­  ­81–110). ​­ Oxford University Press. Sauppe, S., Norcliffe, E., Konopka, A.E., Van Valin, R.D., & Levinson, S.C. (2013). Dependencies first: Eye tracking evidence from sentence production in Tagalog. In Proceedings of the Annual Meeting of the Cognitive Science Society (pp. Berlin. ­­  ­1265–1270). ​­ Savin, H., & Perchonock, E. (1965). Crammatical structure and the inmediate recall of English sentences. Journal of Verbal Learning and Verbal Behavior, 4, ­348–353. ​­ Scheepers, C., & Crocker, M.W. (2004). Constituent order priming from reading to listening: A visual-world study. In M. Carreiras & C. Clifton (Eds.) The On-Line Study of Sentence Comprehension: Eyetracking, ERP, and Beyond (pp. ­­  ­167–185).Psychology ​­ Press. Schluroff, M. (1982). Pupil responses to grammatical complexity of sentences. Brain and Language, 17(1), ­ ­133–145. ​­

68

Experimental syntax Schluroff, M., Zimmermann, T.E., Freeman Jr, R.B., Hofmeister, K., Lorscheid, T., & Weber, A. (1986). Pupillary responses to syntactic ambiguity of sentences. Brain and Language, 27(2), ­ ­322–344. ​­ Schmidtke, J. (2018). Pupillometry in linguistic research: An introduction and review for second language researchers. Studies in Second Language Acquisition, 40(3), ­ ­529–549. ​­ Schütze, C.T., & Sprouse, J. (2013). Judgment data. In R. Podesva & D. Sharma (Eds), Research Methods in Linguistics (pp. Cambridge University Press. ­­  ­27–50). ​­ Sevilla, Y., Maldonado, M., & Shalóm, D.E. (2014). Pupillary dynamics reveal computational cost in sentence planning. Quarterly Journal of Experimental Psychology (2006), 67(6), ­ ­ ­1041–1052. ​­ Slobin, D.I. (1966). Grammatical transformations and sentence comprehension in childhood and adulthood. Journal of Verbal Learning & Verbal Behavior, 5(3), ­ ­219–227. ​­ Snijders, T.M., Vosse, T., Kempen, G., Van Berkum, J.J., Petersson, K.M., & Hagoort, P. (2009). Retrieval and unification of syntactic structure in sentence comprehension: An fMRI study using word-category ambiguity. Cerebral Cortex, 19(7), ­ ­1493–1503. ​­ Sprouse, J., & Lau, E.F. (2013). Syntax and the brain. The Cambridge Handbook of Generative Syntax (pp. ­­  ­971–1005). ​­ Sprouse, J., & Hornstein, N. (Eds.). (2013). Experimental Syntax and Island Effects. Cambridge University Press. Sprouse, J., Schütze, C., & Almeida, D. (2013). A comparison of informal and formal acceptability judgements using a random sample from Linguistic Inquiry 2001-2010, Lingua, 134, ­219–248. ​­ Sprouse, J., & Villata, S. (2021). Island effects. In G. Goodall (Ed.), The Cambridge Handbook of Experimental Syntax (pp. 227–257). Cambridge: Cambridge University Press. Staub, A. (2010). Eye movements and processing difficulty in object relative clauses. Cognition, 116(1), ­ ­71–86. ​­ Staub, A., Clifton Jr, C., & Frazier, L. (2006). Heavy NP shift is the parser’s last resort: Evidence from eye movements. Journal of Memory and Language, 54(3), ­ ­389–406. ​­ Staub A, White SJ, Drieghe D, Hollway EC, Rayner K. Distributional effects of word frequency on eye fixation durations. Journal of Experimental Psychology: Perception and Performance, 36(5), ­ ­1280–1293. ​­ Stowe, L.A., Broere, C.A., Paans, A.M., Wijers, A.A., Mulder, G., Vaalburg, W., & Zwarts, F. (1998). Localizing components of a complex task: Sentence processing and working memory. Neuroreport, 9(13), ­ ­2995–2999. ​­ Sturt, P. (2003). The time-course of the application of binding constraints in reference resolution. Journal of Memory and Language, 48(3), ­ ­542–562. ​­ Sussman, R.S., & Sedivy, J. (2003). The time-course of processing syntactic dependencies: Evidence from eye movements. Language and Cognitive Processes, 18, ­143–163. ​­ Tamaoka, K., Asano, M., Miyaoka, Y., & Yokosawa, K. (2014). Pre-and post-head processing for single-and double-scrambled sentences of a head-final language as measured by the eye tracking method. Journal of Psycholinguistic Research, 43(2), ­ ­167–185. ​­ Thompson, C.K. (2003). Unaccusative verb production in agrammatic aphasia: The argument structure complexity hypothesis. Journal of Neurolinguistics, 16(2–3), ­­ ​­ ­151–167. ​­ Traxler, M.J., & Pickering, M.J. (1996). Case-marking in the parsing of complement sentences: Evidence from eye movements. The Quarterly Journal of Experimental Psychology A: Human Experimental Psychology, 49A(4), ­ ­991–1004. ​­ Trueswell, J., & Tanenhaus, M. (1994). Toward a lexicalist framework for constraint-based syntactic ambiguity resolution. In C. Clifton, L. Frazier & K. Rayner (Eds.), Perspectives on Sentence Processing (pp. Erlbaum. ­­  ­155–179). ​­ Tucker, M., Idrissi, A., & Almeida, D. (2015). Attraction errors for gender in Modern Standard Arabic reading. In Poster Presented at the Architectures and Mechanisms for Language Processing (AMLaP) 2015 Conference. Malta: University of Valletta. Tyler, L.K., & Marslen-Wilson, W.D. (1977). The on-line effects of semantic context on syntactic processing. Journal of Verbal Learning & Verbal Behavior, 16(6), ­ ­683–692. ​­ Van Petten, C., & Luka, B.J. (2012). Prediction during language comprehension: Benefits, costs, and ERP components. International Journal of Psychophysiology, 83(2), ­ ­176–190. ​­ Vasishth, S., & Lewis, R.L. (2006). Argument-head distance and processing complexity: Explaining both locality and antilocality effects. Language 82(4), ­ ­767–794. ​­ Vigliocco, G., & Hartsuiker, R.J. (2002). The interplay of meaning, sound, and syntax in sentence production. Psychological Bulletin, 128(3), ­ 442.

69

Yamila Sevilla and María Elina Sánchez Wagers, M., & Phillips, C. (2009). Multiple dependencies and the role of the grammar in real-time comprehension. Journal of Linguistics. Cambridge University Press, 45(2), ­ ­395–433. ​­ Wagers, M.W., Lau, E.F., & Phillips, C. (2009). Agreement attraction in comprehension: Representations and processes. Journal of Memory and Language, 61(2), ­ 206–237. ­ ​­ Watt, W.C. (1970). On two hypotheses concerning psycholinguistics. In J.R. Hayes (Ed.), Cognition and the Development of Language (pp. Wiley. ­­  ­137–220). ​­ Wells, J.B., Christiansen, M.H., Race, D.S., Acheson, D.J., & MacDonald, M.C. (2009). Experience and sentence processing: Statistical learning and relative clause comprehension. Cognitive Psychology, 58(2), ­ ­250–271. ​­ Zurif, E., Swinney, D., Prather, P., Solomon, J., & Bushell, C. (1993). An on-line analysis of syntactic processing in Broca′ s and Wernicke′ s aphasia. Brain and Language, 45(3), ­ 448–464. ­ ​­

70

5 EXPERIMENTAL SEMANTICS Stephanie Solt

5.1

Introduction and definitions

Research into meaning in language has become an increasingly experimental affair, as can be confirmed by a glance through the recent programmes of leading international conferences in semantics and pragmatics, or the tables of contents of the major journals in these areas. Experimental pragmatics in particular has become an established sub-discipline in the study of meaning, with its own research networks and venues, an active community of scholars, and a shared set of core topics and methodologies. Experimental semantics, by contrast, was slower to establish itself and has had a less well-defined profile. Yet, experimental approaches have come to play a central role in the development and testing of semantic theories. This chapter surveys current trends in experimental research in semantics. Through a series of case studies, we explore the types of questions that have been investigated experimentally and the variety of methodologies used to address them as well as highlight some challenges that arise in conducting experimental research in semantics. More broadly, we will seek to address where experimental techniques have the most to offer, and where they perhaps cloud rather than clarify the empirical picture. Before proceeding, some remarks about the scope of this chapter are in order. Most centrally, what precisely falls under the label of experimental semantics? That is, what counts as “experimental” semantics, as opposed to any other sort of research in semantics? And what do we mean by experimental “semantics”, as opposed to other varieties of experimental linguistics? Regarding the first question, it has been suggested that the distinction often made between “experimental” and “traditional” or “introspection-based” methods in semantics is not a meaningful one, in that both follow the scientific method of generating hypotheses and systematically gathering data to test them (Jacobson, 2018). This is a valid point, which we will return to below. For purposes of the present chapter, however, I will continue to follow the common practice of using “experimental” to refer to studies involving large samples of non-linguistically trained participants, the results of which are reported numerically with the aid of inferential statistics. Regarding the second question, I will focus on semantics as the study of the meaning encoded in linguistic expressions and the principles by which such meanings combine – that is, that aspect of speakers’ grammatical knowledge that relates to meaning. I will exclude work investigating the

71

DOI: 10.4324/9781003392972-7

Stephanie Solt

use of language in context, the domain of pragmatics – though we will see that one fruitful use of experimental approaches has been in establishing the boundary between the two. The scope of this chapter is by necessity limited in other ways. I focus on research carried out within the framework of truth-conditional semantics, largely ignoring work in other traditions (see Section 5.8 for a complementary reference), and furthermore focus on research on better-studied languages such as English. And for reasons of space, I can give only little attention to research into the acquisition of semantics (a topic that merits its own chapter) or the processing of semantic content, except to the extent that such findings have been applied to questions in theoretical semantics. Even with these limitations, there is far more interesting work in experimental semantics than could possibly be covered here, hence the discussion shall be a selective one.

5.2

Historical perspectives

There is a long history of experimental research on meaning in the fields of psychology and psycholinguistics (e.g. Hörmann, 2012 on quantification; Rips & Turnbull, 1980 on adjective meaning), though for many years there was limited interaction between these research traditions and work in theoretical linguistics. This changed around the turn of the 21 century, when a loose group of scholars began to apply psycholinguistic methods to questions in theoretical pragmatics, focusing on topics including reference, metaphor and especially scalar implicature. In a seminal volume by Ira Noveck and Dan Sperber (2004), this emerging approach was christened “Experimental Pragmatics”. In the following years, work in experimental pragmatics was promoted via coordinated research networks and funding schemes (especially the EURO-XPRAG network and subsequent XPRAG.de Priority Program) and a series of targeted conferences, leading to the development of novel experimental methodologies, theoretical advances, and active and productive dialogue between the two perspectives. Experimental pragmatics is now widely recognized as a sub-discipline of linguistics in its own right. The emergence of an experimental tradition in theoretical semantics has followed a slower and less easy-to-trace path. Indeed, while the use of experimentation to address theoretical questions in semantics is now quite common, it is not at all clear that one should say there is such a subfield as “Experimental Semantics”. If anything, the most recent trend has been to fold more semantically oriented work into the same venues that highlight research in experimental pragmatics, examples being the recent Oxford Handbook of Experimental Semantics and Pragmatics (Cummins & Katsos, 2019) and the new conference Experiments in Linguistic Meaning. One reason for this divergence is perhaps that there has been no unifying core topic in semantics that has played the catalysing role that scalar implicature did for pragmatics, namely to bring together researchers working in different frameworks with an interest in experimentally testing the predictions of competing theoretical accounts. Also relevant are the challenges inherent to conducting experimental research into linguistic representation as opposed to language use, an issue that will be discussed further below. Some of the earliest examples of experimentally sourced data playing a role in semantic theory building come from outside formal linguistics. A nice example involves the phenomenon known as complement anaphora. It had been an accepted view among semanticists that pronouns cannot readily refer anaphorically to an antecedent that is not explicitly represented. For example, in (1), it cannot refer to the ball not in the bag, although the existence of the latter can be easily inferred. But in a series of papers originating out of a program of research into the psychology of quantifier interpretation, psychologists Linda Moxey, Anthony Sanford and colleagues demonstrated via a variety of psycholinguistic techniques that in the case of negative quantifiers such as not all, few 72

Experimental semantics

and hardly any, but not positive quantifiers such as most and a few, anaphoric reference to the complement of two sets is robustly available. For example, in (2), they can be understood to refer to the fans who didn’t go to the game (Moxey & Sanford, 1993; Sanford et al., 1994). 1 Nine of the ten balls are in the bag. # It’s under the couch. 2 Few fans went to the game. They watched it on TV instead. These findings were taken up by theoretical semanticists, and a productive dialogue ensued, in which the conditions allowing complement anaphora were more precisely characterised via a combination of experimental and introspection-based approaches, and formal theories were developed and tested against these carefully elicited data (e.g., Nouwen, 2003). More recently, it has been theoretical semanticists themselves who have been turning to experimental approaches to source the linguistic data needed to develop and test theoretical proposals; indeed, it is today hardly appropriate to distinguish between “theoreticians” and “experimentalists”, given that many leading scholars are experts in both areas.

5.3

Critical issues and topics

In this section, I briefly introduce two fundamental issues faced by the researcher considering taking an experimental approach to theoretical questions in semantics. These will serve as a framework for the discussion of specific domains of research in Section 5.4. The first and most basic question is how, or even whether, one can design an experiment into semantic meaning. In the framework that I assume in the present chapter – and which underlies most of the work discussed in subsequent sections – a sentence of natural language has a hardcoded logical meaning that is derived in the grammar on the basis of the lexical meaning of the words and morphemes it contains and the compositional rules by which such meanings combine. When that sentence is uttered in context, however, this core semantic meaning is enriched via a variety of pragmatic processes: ambiguities are resolved, referents specified, and implicatures added via reasoning about the speaker’s choice to use this particular sentence in this particular context (Grice, 1975). When we as language users interpret a sentence encountered ‘in the wild’, we necessarily factor in both its semantics and the pragmatics of the context. The same is true of participants who encounter a sentence (or other unit of language) in the context of an experimental task. A participant might rate a sentence as acceptable (or unacceptable) or judge it to be an appropriate (or inappropriate) description of a given situation, but this cannot be interpreted as a direct window into its grammaticality or truth conditions. Of course, semanticists sourcing data via introspection or small-scale elicitation face the very same issue, but the challenge is magnified in the case of an experimental participant who has not been trained on the distinction between entailment and implicature. Considering this basic fact about linguistic interpretation, it is the task of the researcher to design experiments that nonetheless can yield insight into the underlying semantic representation. Doing so requires establishing some linking hypothesis that connects participants’ behaviour on an experimental task to the semantic construct of interest. In the next section, we will see various ways this has been done. A second issue relates to the role of formal experiments vis-à-vis other types of empirical research in semantics. On this point, it must be noted that the recent trend towards experimentation has not met with universal approval. Jacobson (2018), in particular, expresses a more sceptical view. The distinction between “experimental” and “armchair” or “introspection-based” methods is a false dichotomy, she argues: research based on data systematically elicited from a small number 73

Stephanie Solt

of informants (or even the semanticist herself as informant) is no less experimental than that based on multi-subject studies whose results are reported numerically, in that both follow the scientific method of generating hypotheses and gathering data to test them, varying some parameter(s) of interest while controlling for possible confounds. Indeed, “experimental” semantics can be viewed as the extension of the basic practice of eliciting linguistic judgements that semanticists have been following all along. And some of the same factors – such as the need to carefully control contexts – are crucial to well-designed research of both types. Jacobson expresses concern about the (implicit or explicit) assumption that “experimental” research – with its numbers and tests of statistical significance – is somehow more scientific and valuable than that based on “traditional” methods in semantics; she cautions that pressure to experimentally “validate” even undisputed judgements will slow progress in the field. One might attempt to reply to these concerns in various ways. Most scholars working in this area would, I think, take the position that formal experiments are a complement to – rather than a replacement for or improvement on – other empirical approaches in semantics. And the topics where experiments have most commonly been employed are precisely those where theoreticians themselves have found their intuitions to be unclear, or where judgements are disputed. Jacobson herself stresses that experimental approaches can be valuable, and perhaps even necessary to address some sorts of questions. But this raises the obvious question of where experimental techniques have the most to offer to theoretical semantics. We will try to get insight into this question in what follows.

5.4

Current contributions and research

Experimental methods have been applied in a wide range of empirical domains in semantics. In this section, I focus on several areas in which there is a critical mass of work and thus where distinct approaches can be highlighted and contrasted. A particular use of experimental research has been to establish whether some aspect of the meaning communicated by an expression is part of its semantics, or whether instead it is derived via pragmatic enrichment. As noted in the previous section, no experimental methodology can directly access semantic meaning independently of pragmatics; thus, this question must be addressed indirectly. A typical approach has been to compare the behavioural or processing profile of the item in question to that of an accepted case of pragmatic enrichment, such as the scalar implicature from an utterance of some that not all obtains.

5.4.1

Cardinal numerals

A case in point of research at the semantics/pragmatics interface involves cardinal numerals. By way example, a sentence such as (3) allows two readings: most prominently, it can be interpreted to mean that Sue has exactly four mugs in her office (the ‘exact’ reading), but in an appropriate context – say, when the question at hand is who can lend me four mugs – it can also be interpreted to mean she has four or more mugs (the ‘at least’ reading). 3. Sue has four mugs in her office. There are two prominent views on the relationship between the two readings: either the pattern exemplified here reflects a semantic ambiguity, or the ‘at least’ reading is taken to be the basic one, with the ‘exact’ reading derived from it via scalar implicature, in (3) the implicature that Sue 74

Experimental semantics

doesn’t have five mugs (see Spector, 2013, for more in-depth discussion of the different theoretical positions). Introspection-based data can be put forward in favour of both positions; this has prompted interest in using findings from behavioural, processing, and acquisition studies to attempt to resolve the matter. The early work of this sort produced somewhat mixed results. Some seems to support that the ‘exact’ reading is indeed semantic in nature rather than generated via implicature: Papafragou and Musolino (2003) demonstrate that children who do not consistently derive scalar implicatures for weak scalar items such as some nonetheless prefer the exact interpretation of number words; Huang and Snedeker (2009) provide evidence from a visual world eye-tracking paradigm that the lower-bounded semantic meaning of some (‘some and possibly all’) is accessed before the double-bounded enriched meaning (‘some but not all’), but the same is not the case for number words; Huang et al. (2013) show that in implicature-cancelling contexts, adults and children tend to prefer the lower-bounded interpretation of some but the exact reading of number words; and Marty et al. (2013) show that when performing under processing load, participants are less likely to access the doubly bounded enriched meaning for some, but more likely to access the doubly bounded ‘exact’ interpretation of number words. Other studies, however, are more consistent with the implicature view; in particular, Panizza et al. (2009) find a lower rate of ‘exact’ readings in downward- versus upward-entailing contexts, a similar pattern to what is observed for expressions that give rise to scalar implicatures. More recent work tends to support a role for scalar implicatures in deriving the ‘exact’ reading of number words, while also demonstrating they behave differently from other scalar terms. In particular, Bott and Chemla (2016) and Meyer and Feiman (2021) show via a covered box selection task that the enriched meaning of some primes the ‘exact’ reading of number words, and vice versa; that is, when participants access the ‘some but not all’ reading of some, they are subsequently more likely than otherwise to select the ‘exact’ reading of a number word, and likewise in the opposite direction. This can be taken as evidence that a common mechanism is involved in the two cases. But consistent with earlier work, the latter study also finds that the ‘exact’ meaning is more preferred for number words than the double-bounded reading is for some. Thus, looking across studies, a consensus picture seems to be emerging, according to which the ‘exact’ reading of number words involves a scalar implicature, but this is more automatic and readily available than in the case of other scalar items, perhaps because the alternatives themselves are more accessible (see Meyer and Feiman, 2021, for discussion).

5.4.2

Other topics at the semantics/pragmatics interface

A number of other topics at the semantics/pragmatics interface have been subject to a similar approach, examples including the multiplicity inference of plurals (e.g. Tieu et al., 2014) and the exhaustivity inference of cleft constructions (e.g. De Veaugh-Geiss et al., 2018). A nice recent example of such work is Marty and Romoli (2021), which deals with free choice inferences as in (4) ­ and negative free choice inferences as in (5): 4 It is permitted that Mia buys apples or bananas. • It is permitted that Mia buys apples and it is permitted that Mia buys bananas 5 It is not required that Mia buys apples and bananas. • It is not required that Mia buys apples and it is not required that Mia buys bananas The former inference type is well documented, but theories differ as to whether it is an implicature or an entailment. The latter type has been less studied, and there is debate in the literature 75

Stephanie Solt

as to whether it in fact exists. Marty and colleagues report on three experiments employing a sentence-picture acceptability task, in which participants see a visual representation of which options are allowed and required, and judge sentences such as (4) and (5) as “good” or “not good” descriptions of it. What stands out about this work is the inclusion of multiple baseline and control conditions – including unambiguously true and false sentences, as well as controls utilizing the same sentences and ones employing the same visuals – and the use of follow-up experiments to rule out possible confounds. This allows it to be more definitively concluded that any effects found in the target conditions are due to the presence or absence of the inferences in question, rather than some other aspect of the experimental setup or materials. Results show that both free choice and negative free choice inferences are available, but while the former are generated at the same level as entailed content, the latter arise less frequently, and are more dependent on the specific design of the experimental items. Drawing on previous findings that implicatures are less robustly available than entailments, the authors argue that this evidence favours a hybrid account on which free choice inferences are entailments, whereas negative free choice inferences are implicatures. Interestingly, these results from Marty et al. are consistent with a further result from the experiments of Meyer and Feiman (2021), who do not find priming between free choice inferences of disjunction and the upper-bounded readings of numerals and some, suggesting that different mechanisms are involved in the two cases. Thus here, experimental work in very different paradigms converges on the conclusion that free choice inferences cannot be treated as straightforward cases of (scalar) implicature.

5.4.3  Quantificational expressions Turning to topics within semantics proper, quantificational expressions were an early area of interest in experimental semantics. A particular question where such methods have been productively applied is which among multiple distinct but truth-conditionally equivalent representations should be assigned to a given quantifier. Here, intuitions as to the situations in which the quantifier can be truthfully applied -– whether the semanticist’s own intuitions or judgements elicited from experimental participants – cannot resolve the matter, leading researchers to turn to processing experiments. The assumption underlying such approaches is that the logical representation of a quantifier biases some preferred verification strategy, with the result that verification will be facilitated when that procedure can be followed. As an example, one might think that each of the sentences in (6) could have either of the two logically equivalent representations in (7): 6 a b 7­ a b

Most of the dots are blue. More than half of the dots are blue. |blue dots| > |non-blue ­ ​­ dots| |blue dots| > ½ |dots|

Hackl (2009), however, argues on theoretical grounds that most and more than half have distinct logical forms that reflect their morphological composition: most expresses set cardinality comparison, deriving from an underlying superlative semantics, while more than half makes explicit reference to ‘half ’. Hackl reports evidence from a self-paced verification task designed to favour a vote-counting strategy arguably compatible with the semantics of most but not more than half; reaction-time advantages are indeed found for most over more than half, supporting the proposal. 76

Experimental semantics

In related work, Pietroski et al. (2009) investigate the verification of most in a timed verification task, where participants see arrays of dots of two colours for 200 ms and judge the truth of a sentence of the form in (6a) relative to them. Participants’ accuracy on the task was found to depend on the ratio of the cardinalities of the two colours in a way characteristic of the operation of the Approximate Number System (Dehaene, 2011); however, the arrangement of the dots had limited effect. From this, the authors conclude that the semantics of most is based on set cardinality, as in (7a), not a non-cardinality based correspondence relation. Subsequent work in the same paradigm by Lidz et al. (2011) using arrays with multiple dot colours however produced results suggesting the semantic of most is not that in (7a) but rather one based on set subtraction (roughly ‘there are more blue dots than dots minus blue dots’). Knowlton et al. (2022) apply a somewhat similar approach to universal quantifiers, as in (8), addressing a question from the theoretical literature, namely whether their semantics should be stated in first-order terms (‘everything that is a big dot is blue’) or second-order terms (‘the set of big dots is a subset of the set of blue things’) ­8 Every/each/all ­ ­ (of ­ the) big dot(s) ­ is/are ­ blue. They report on a series of experiments in which universally quantified sentences were verified against dot arrays, and participants were subsequently asked to estimate how many things the ­ predicate (here, big dots) applied to. Greater accuracy was observed when the quantifier was all or every than when it was each, from which the authors conclude that the former are understood in second-order terms that encourage group representation, while the latter is understood in firstorder terms that encourage individual representation. Thus, processing tasks have shown the potential to shed light on semantic representations above and beyond truth conditions. Other work on most and more than half in particular points however to a more complex picture, illustrating the challenges in relating processing findings to questions in theoretical semantics. Kotek et al. (2015) find in a non-timed verification task that in the case of most but not more than half, some participants allow a so-called superlative reading, on which (6a) is true in the situation where fewer than half of the dots are blue, but there are more blue dots than dots of any other individual colour; they take this as evidence for a superlative semantics for most, as originally proposed by Hackl. Experimental evidence of a truth-conditional difference in threshold between most and more than half is also reported by Denić and Szymanik (2022). These findings are not entirely consistent with those of Pietroski et al. (2009) and Lidz et al. (2011). This may to some extent relate to methodological differences, such as that between timed and untimed tasks, but note also that the experiments reported by Pietroski, Lidz, and colleagues include only most, without comparison to other quantifiers, making it difficult to assess to what extent the observed patterns derive from the underlying semantics of most versus some other aspect of the experimental setup. Thus, the collective experimental findings have not yet provided conclusive evidence for one particular sematic treatment of most or more than half, but they have helped to prompt renewed theoretical interest in the topic (e.g. Solt, 2016). As a final and broader example in this area, van Tiel et al. (2021) carried out a large-scale experimental investigation of quantifiers such as some, most, and all, with the goal of adjudicating between two views on their semantics: generalized quantifier theory (GQT), which holds them to have strict truth conditions based on thresholds on the size of the intersection between two sets (cf. [7]), and prototype theory, according to which truth is inherently gradient and linguistic meaning is organized around prototypes. The study employed a production task in which participants were shown large arrays of dots of varying colours and answered the question “___ of the dots are 77

Stephanie Solt

red” by filling in the blank. On the surface, the findings appear compatible with the predictions of prototype theory, in that the ranges of values for which individual quantifiers were used lacked sharp boundaries and had a peaked shape. However, van Tiel and colleagues demonstrate that a computational model in which GQT semantics are augmented by a pragmatic module – which encodes probabilistic reasoning about listener interpretation as well as cognitive factors – explains the data as well as a model that encodes prototypes directly into the semantics of quantifiers. This undertaking exemplifies a promising recent approach of drawing on probabilistic pragmatic theories to derive linking hypotheses to connect semantic theories with experimental findings (see also Waldon & Degen, 2020).

5.4.4

Gradable adjectives

The semantics of gradable adjectives offers further examples of how experimental methods can take the researcher beyond what is possible via introspection, addressing the second of the two issues raised in the previous section. It is widely agreed that the truth or falsity of a sentence such as Sue is tall is dependent on the context, and in particular, on a comparison class C that provides a threshold for the application of the adjective; for example, Sue might count as tall in the context of gymnasts, but not in the context of adult women. But how exactly should the truth conditions be stated relative to C? For example, is Sue tall relative to C if she is above the mean height of members of C? If she is among the tallest n% of Cs? If she is sufficiently close in height to the tallest C? All of these possibilities have been suggested in the theoretical literature. To adjudicate between such alternate theories of adjective meaning, it is necessary to determine how the application of the adjective changes as the statistical properties of the comparison class are varied, as it is here that different approaches make distinct predictions. This is difficult to do via introspection alone: one might have clear intuitions about which of a given set of entities can be described as, say, tall; but it is much more challenging to intuit how one’s judgements change as the context does. Several authors have, therefore, approached the question experimentally (e.g., Schmidt et al., 2009; Solt & Gotzner, 2012). The general method employed in such studies is to show participants an array of items representing a comparison class – for example, a set of figures of varying heights – with the task being to indicate which of the items can be described by the adjective. Abstracting away from the specific research questions of individual studies, a general finding is that the truth conditions of gradable adjectives must reference statistical reasoning about the measures (e.g., heights) of individuals and cannot be stated more simply in terms of the rank ordering of individuals in the comparison class. Recently, this approach has been extended to investigate the difference between contextdependent relative gradable adjectives such as tall and absolute gradable adjectives such as straight and bent, whose thresholds appear to reference a scalar minimum or maximum point. There are two prominent views on the interpretive difference between the two classes: semantic theories according to which the threshold is determined by the structure of the measurement scale lexicalized by the adjective (Kennedy, 2007) and Bayesian pragmatic theories in which it is based on prior knowledge of the distribution of measures in the relevant comparison class, which is assumed to differ for relative versus absolute adjectives (Lassiter and Goodman, 2013; Qing and Franke, 2014). Xiang et al. (2022) test the predictions of these theories by investigating 21 adjectives of different classes in the context of comparison classes, using three tasks: a priors task in which participants indicated which element of the comparison class was most likely (without linguistic material); a truth value task in which they indicated which could be described by the adjective; and a posterior judgement task in which they indicated which was most likely to be meant by a 78

Experimental semantics

use of the adjective. The priors elicited this way were input into the Bayesian models of Lassiter and Goodman and of Qing and Franke, and the model predictions compared to the experimentally elicited ones. The authors find that the Bayesian models perform well in predicting posterior judgements but less well in predicting judgements about truth conditions, where semantic theories have an advantage; they demonstrate that a hybrid model incorporating both semantic and pragmatic thresholds does the best at capturing the totality of the data. What is notable about this work is that it explicitly separates the semantic question (to which items can the adjective be truthfully applied?) from the pragmatic question (what is communicated by the use of the adjective?) and furthermore considers two explicit linking theories for connecting theoretical predictions with participant behaviour on the truth value judgement task, one based on semantic thresholds and a second based on production behaviour.

5.4.5

Modal expressions

Modal expressions are another domain where the subtlety of the crucial judgements has led researchers to turn to formal experiments to supplement introspection-based data. Work in this area provides clear illustrations of the challenges and potential pitfalls inherent to experimental semantics, but also the potential to obtain theoretically relevant data via quite simple methods. One topic of interest relates to the type of semantic theory required for epistemic modals such as might. It is generally accepted that the truth conditions of a sentence of the form might p must be stated with respect to some perspective or body of evidence K (Kratzer, 1981). For example, John might be in Boston is true if the prejacent John is in Boston is compatible with K, that is, if among the worlds compatible with K, there is at least one in which John is in Boston is true. But which body of evidence is relevant? On the more standard contextualist view originating with Kratzer, what counts is the perspective of the speaker and the context of utterance. In a more radical relativist view (MacFarlane, 2011), by contrast, it is the context of assessment that matters. In the debate between these two theoretical positions, evidence from so-called eavesdropping and retraction scenarios has played a central role. If a sentence of the form might p is uttered in a situation where p is compatible with the speaker’s knowledge at that time, but it is subsequently established that p is false, should the original might sentence be judged true (as predicted by the contextualist view) or false (as predicted by the relativist view)? Knobe and Yalcin (2014) observe there is little consensus in the literature as to how speakers actually do judge such scenarios and report a series of experiments aimed at clarifying the empirical picture. For example, in one experiment participants saw the following scenario in either its modal or non-modal version: 9 Knobe and Yalcin (2014; experiment 4): “Joe is in Boston” [non-modal] “Joe might be in Boston” [modal]. Just then, George gets an email from Joe. The email says that Joe is in Berkeley. So George says: “No, he isn’t in Boston. He is in Berkeley.” Sally and George are talking about whether Joe is in Boston. Sally carefully considers all the information she has available and concludes that there is no way to know for sure. Sally says: Participants were asked to indicate their agreement with the sentence “What Sally said is false”. In the non-modal condition, there was a high level of agreement; this serves as an important control, because according to semantic theory this sentence is unquestionably false, meaning that any other 79

Stephanie Solt

sort of response pattern would suggest a methodological issue. In the crucial modal condition, by contrast, agreement was significantly lower. The authors take this as evidence that participants judged the truth of the modal sentence according to the context of utterance, supporting the contextualist position. Other studies employing a similar paradigm have found more mixed results. For example, Beddor and Egan (2018) found effects of the question under discussion (QUD) on participants’ judgements of the truth or falsity of might p, suggesting some aspects of the assessment context are in fact taken into consideration. Even more fundamentally, the works discussed above have required non-linguistically trained participants to give judgements as to whether a sentence is “true” or “false”, and it appears assumed that in doing so, they understand these terms in the same way that linguists and philosophers of language do. Recent work suggests this isn’t always the case. In a version of Knobe and Yalcin’s task using different scenarios, Reuter and Brun (2022) find that the majority of participants judge a non-modal sentence true in the relevant scenario, which they argue to mean that “true” can be understood as correspondence with the available evidence rather than with the facts of the world. Ricciardi and Martin (2022) obtain a similar result with a modified version of Knobe and Yalcin’s scenario (9) in which Sally’s assertion of the non-modal sentence is better justified. Thus, in the end, the many experimental studies on this topic have not yet been able to resolve the contextualism/relativism debate, though the complex empirical picture they have revealed has served as impetus for the development of more nuanced theories of both kinds. Other work on modality has yielded results that more clearly constrain theory. There is an ongoing debate about the semantics of epistemic must: does it express universal quantification over normative worlds, as proposed by Kratzer (1981), or is its semantics stronger (von Fintel & Gillies, 2010) or weaker (Lassiter, 2016) than this? Lassiter demonstrates in a simple experiment that in a situation of very high but non-maximal probability, must p is accepted at a higher rate than certain p and know p, supporting the ‘weak must’ view. Lassiter himself argues for a probabilistic threshold-based semantics for must. Del Pinal and Waldon (2019) follow this up with a series of ­ ​­ experiments on the acceptability of cases of ‘epistemic tension’, where competing theories make distinct predictions. Using two types of tasks – acceptability judgements on contradictions such as It must be raining but it’s possible that it isn’t and judgements of downplaying – they systematically vary the modal expression as well as other theoretically relevant factors; for example, to assess Lassiter’s threshold-based theory they compare must p to it is 95% certain that p. From the perspective of the resulting rich set of empirical data, they assess different theoretical proposals, concluding that Kratzer’s original restricted quantification account is best able to account for the totality of it. This work thus offers a nice demonstration that theoretically relevant findings can be obtained through the careful and systematic application of very simple methods.

5.5

Main research methods

Semantics, broadly speaking, is concerned with the connection between language and extralinguistic reality. It is thus not surprising that the mainstay of experimental research in semantics involves what might be termed language-situation matching tasks, in which some linguistic expression is assessed for its compatibility with some state(s) of affairs. This includes in particular truth value judgement and verification tasks, where experimental participants are presented with a sentence and a visual and/or verbal description of some situation and asked to indicate whether the sentence is true or false in that situation, or a good/bad or correct/incorrect description of the situation. Also included under this heading are selection tasks – where the participant indicates which one of multiple visuals best corresponds to the meaning of a sentence – and categorisation 80

Experimental semantics

tasks, where the participant selects all those items that can be described by an expression. All of these have been implemented in many variations, some of which have proven particularly useful in investigating subtle semantic effects. For example, a covered box selection task asks participants to choose between one or more depicted situations and a “covered box” – a blank visual described as a situation hidden from view – with the instruction being to select the covered box only if none of the visible situations is compatible with the linguistic expression; in this way, the existence of a dispreferred but potentially available interpretation can be probed. Other variants have been employed to good effect in some of the above-discussed studies, for example, a verification task under processing load (Marty et al., 2013) and a selection task with priming (Meyer & Feiman, 2021). Grammaticality and acceptability judgement tasks have also played some role in experimental semantics, though not to the extent they do in experimental work on syntax, where the wellformedness of strings of language is what is of primary interest. In semantics, a particular use has been to test sentences that on one analysis, but not another, are contradictions; the underlying assumption is that contradictory status will lead to degraded acceptability on par with what is observed for clear cases of contradictions. Metalinguistic tasks of various sorts are also common, including inferencing tasks, where participants are presented with a sentence and must indicate whether another sentence follows from it, as well as judgements as to whether an assertion is justified, or a retraction is appropriate (see above). Importantly, all of the above methodologies are essentially formalizations of the same methods that semanticist have used introspectively, or on a smaller scale to elicit judgements from informants. This takes us back to the earlier observation that there is no sharp dividing line between “experimental” and “non-experimental” research in semantics, and illustrates that even conceptually simple methods can yield theoretically relevant results. As was discussed above, paradigms investigating real-time online processing of linguistic stimuli, such as reaction time and eye-tracking studies, have also proven valuable in addressing questions of semantic theory; though as the data they yield are in a sense more distant from the semantic issue under investigation, their interpretation requires caution.

5.6

Recommendations for practice

Section 5.4 reviewed some of the many ways that experimental methodologies have been successfully employed in theoretical semantics, and from this discussion, some broad guidelines for practice can be derived. Most basically, experimental approaches have proven valuable in cases where competing semantic/pragmatic theories make distinct predictions, and where introspectively sourced data has not been sufficient to resolve the matter. The reasons for “not sufficient” are varied: the relevant judgements might be disputed in the literature, or the researcher’s own intuitions might be shaky or variable; other empirical tests might yield conflicting results; the issue at hand might require multiple judgements on subtly different stimuli, where it is difficult for a single individual to evaluate each independently; or the issue might be one not readily accessible to introspection. It certainly cannot be recommended to conduct experiments when such factors are not present, that is, simply to “validate” otherwise clear and undisputed judgements. Considering the characteristics of the most robust studies discussed in the previous section, some more specific guidelines can be suggested. First and most centrally, since no experimental methodology can provide a direct window into underlying semantic representations independent of pragmatic processes and methodological factors, it is crucial to give thought from the start to how the experimentally generated data will be related to the theoretical issue, that is, to establish a suitable 81

Stephanie Solt

linking hypothesis. Relatedly, the data generated through experiments in semantics is almost always gradient in nature; that is, we might learn that a construction receives a mean acceptability rating of 4.9 out of 7, or is accepted at a rate of 72% in a given situation. Such results can never be interpreted in isolation as indications that the construction is or is not well formed or felicitous (see Jacobson [2018] for discussion). Rather, conclusions can only be drawn on the basis of comparison between such values and others. Thus, it is essential that experiments include appropriate control and baseline conditions to provide a framework for the interpretation of the critical results. Also important is ensuring that the experimental task is comprehensible to participants and that the stimuli are natural. To this end, small-scale pilot testing can be valuable in identifying aspects of an experimental design that are potentially confusing or misleading, as these might otherwise escape the notice of a researcher who is already deeply familiar with the phenomenon under investigation. Finally, as the case studies discussed above have shown, it is rare that a single experiment, or even set of experiments, will fully resolve a theoretical issue – though of course the same is true of data sourced via any other empirical method. Rather, what we typically find is that experimentation yields a deeper and more extensive understanding of the empirical landscape, which is often more complex than previously recognized; this can in turn provide impetus to further theoretical advances. And in many cases, it is only after a body of studies employing different methodologies is available that the empirical picture becomes clearer, in that this is what is required to rule out the effects of extraneous factors and methodological confounds. It is advisable to approach an experimental undertaking with this mindset, and to use caution in drawing conclusions from any individual study.

5.7

Future directions

There is every reason to think that experimental methods will continue to be an important source of empirical data in theoretical semantics. As to how the subfield will develop, two current trends can be highlighted. The first is an increasing focus on transparency, achieved by making experimental materials, full data sets, and statistical code publicly available; by explicitly stating linking hypotheses; and by preregistering studies, thereby committing upfront to the predictions to be tested. Such measures can only serve to raise the quality of experimental work. A second trend is the increasing combination of experimental methods and computational modelling (as e.g., in van Tiel et al., 2021) as a means to relate the resulting data to theoretical constructs such as truth conditions. It is not yet clear how widespread such combined approaches will become, but existing work suggests they offer a particularly powerful way to test the predictions of semantic theories.

Further reading Cummins, C., & Katsos, N. (2019). The Oxford Handbook of Experimental Semantics and Pragmatics. Oxford University Press. Matlock, T., & Winter, B. (2015). Experimental semantics. In B. Heine & H. Narrog (Eds.), The Oxford Hand­­  ­771–790). ​­ book of Linguistic Analysis (pp. Oxford University Press.

Related topics Experimental pragmatics; experimental syntax; contrasting online and offline measures in experimental linguistics; new directions in statistical analysis for experimental linguistics

82

Experimental semantics

References Beddor, B., & Egan, A. (2018). Might do better: Flexible relativism and the QUD. Semantics and Pragmatics, 11, 7. Bott, L., & Chemla, E. (2016). Shared and distinct mechanisms in deriving linguistic enrichment. Journal of ­ ​­ Memory and Language, 91, 117–140. Cummins, C., & Katsos, N. (Eds.). (2019). The Oxford Handbook of Experimental Semantics and Pragmatics. Oxford University Press. De Veaugh-Geiss, J. P., Tönnis, S., Onea, E., & Zimmermann, M. (2018). That’s not quite it: An experimental investigation of (non-) exhaustivity in clefts. Semantics and Pragmatics, 11, 3. Dehaene, S. (2011). The Number Sense: How the Mind Creates Mathematics (2n ed). Oxford University Press. ​­ Denić, M., & Szymanik, J. (2022). Are most and more than half ­truth-conditionally equivalent? Journal of ​­ Semantics, 39, ­261–294. ­ ­351–383. ​­ Von Fintel, K., & Gillies, A. S. (2010). Must... stay... strong!. Natural Language Semantics, 18(4), Grice, H. P. (1975). Logic and conversation. In P. Cole & J. Morgan (Eds.), Syntax and Semantics. Vol. III, ­­  ­41–58). ​­ ­ ​­ (pp. New-York Academic Press. Hackl, M. (2009). On the grammar and processing of proportional quantifiers: Most versus more than half. ­ 63–98. ­ ​­ Natural Language Semantics, 17(1), Hörmann, H. (2012). The calculating listener, or: How many are einige, mehrere and ein paar (some, several, and a few)?. In R. Bäuerle, C. Schwarze, & A. von Stechow (Eds.), Meaning, Use, and Interpretation of ­­  ­221–234). ​­ Language (pp. de Gruyter. Huang, Y. T., & Jesse Snedeker, J. (2009). Online interpretation of scalar quantifiers: Insight into the ­semantics–pragmatics ​­ ​­ interface. Cognitive Psychology, 58, ­376–415. Huang, Y. T., Spelke, E., & Snedeker, J. (2013). What exactly do numbers mean? Language Learning and Development, 9, 105–129. ­ ​­ Jacobson, Pauline. (2018). What is—or, for that matter isn’t—‘experimental’ semantics? In D. Ball & B. Rab­­  ­46–72). ​­ ern (Eds.), The Science of Meaning: Essays on the Metatheory of Natural Language Semantics (pp. ­ Oxford University Press. Kennedy, C. (2007). Vagueness and grammar: The semantics of relative and absolute gradable adjectives. Linguistics and Philosophy, 30, ­1–45. ​­ Knobe, J., & Yalcin, S. (2014). Epistemic modals and context: Experimental data. Semantics and Pragmatics, ­ ​­ 7, 1–21. Knowlton, T., Pietroski, P., Halberda, J., & Lidz, J. (2022). The mental representation of universal quantifiers. Linguistics and Philosophy, 45, ­911–941. ​­ Kotek, H., Sudo, Y., & Hackl, M. (2015). Experimental investigations of ambiguity: The case of most. Natural ­ ­119–156. ​­ Language Semantics, 23(2), Kratzer, A. (1981). The notional category of modality. In H. Eikmeyer & H. Rieser (Eds.), Words, Worlds, & ­­  ­38–74). ​­ Contexts: New Approaches in Word Semantics (pp. de Gruyter. Lassiter, D., & Goodman, N. D. (2013). Context, scale structure, and statistics in the interpretation of positiveform ­ adjectives. Proceedings of Semantics and Linguistic Theory, 23, ­587–610. ​­ Lassiter, D. (2016). Must, knowledge, and (in)directnesss. Natural Language Semantics, 24, ­117–163. ​­ Lidz, J., Pietroski, P., Halberda, J., & Hunter, T. (2011). Interface transparency and the psychosemantics of most. Natural Language Semantics, 19(3), ­ ­227–256. ​­ MacFarlane, J. (2011). Are assessment-sensitive. In A. Eganand & B. Weatherson (Eds.), Epistemic Modality (pp. ­­  ­144–179). ​­ Oxford University Press. Marty, P., Chemla, E., & Spector, B. (2013). Interpreting numerals and scalar items under memory load. Lingua, 133, ­152–163. ​­ Marty, P., & Romoli, J. (2021). Presupposed free choice and the theory of scalar implicatures. Linguistics and Philosophy, 45, ­91–152. ​­ Meyer, M. C., & Feiman, R. (2021). Priming reveals similarities and differences between three purported cases of implicature: Some, number and free choice disjunctions. Journal of Memory and Language, 120, 104206. Moxey, L. M., & Sanford, A. J. (1993). Communicating Quantities: A Psychological Perspective. Lawrence Erlbaum Associates, Inc. Nouwen, R. (2003). Complement anaphora and interpretation. Journal of Semantics, 20, 73–113. ­ ​­

83

Stephanie Solt Noveck, I. A., & Sperber, D. (Eds.). (2004). Experimental Pragmatics. Palgrave Studies in Pragmatics, Language and Cognition. Palgrave Macmillan. Panizza, D., Chierchia, G., & Clifton Jr, C. (2009). On the role of entailment patterns and scalar implicatures ­ 503–518. ­ ​­ in the processing of numerals. Journal of Memory and Language, 61(4), Papafragou, A., & Musolino, J. (2003). Scalar implicatures: Experiments at the semantics–pragmatics inter­ ­253–282. ​­ face. Cognition, 86(3), Pietroski, P., Lidz, J., Hunter, T., & Halberda, J. (2009). The meaning of ‘most’: Semantics, numerosity and ­ ­554–585. ​­ psychology. Mind & Language, 24(5), ­ Del Pinal, G., & Waldon, B. (2019). Modals under epistemic tension. Natural Language Semantics, 27(2), ­135–188. ​­ Qing, C., & Franke, M. (2014). Gradable adjectives, vagueness, and optimal language use: A speaker-oriented ​­ model. Proceedings of Semantics and Linguistic Theory, 24, ­23–41. Reuter, K., & Brun, G. (2022). Empirical studies on truth and the project of re-engineering truth. Pacific ­ ­1–25. ​­ Philosophical Quarterly, 103(3), Ricciardi, G., & Martin, J. (2022). Accounting for variability in the truth-evaluation of bare epistemic possibility ­ 5249. statement. Proceedings of the Linguistic Society of America, 7(1), Rips, L. J., & Turnbull, W. (1980). How big is big? Relative and absolute properties in memory. Cognition, ­ ­145–174. ​­ 8(2), Sanford, A. J., Moxey, L. M., & Paterson, K. (1994). Psychological studies of quantifiers. Journal of Semantics, 11, ­153–170. ​­ Schmidt, L. A., Goodman, N. D., Barner, D., & Tenenbaum, J. B. (2009). How tall is tall? Compositionality, statistics, and gradable adjectives. In Proceedings of the 31st Annual Conference of the Cognitive Science Society (pp. ­­  ­2759–2764). ​­ Amsterdam. Solt, S. (2016). On measurement and quantification: The case of most and more than half. Language, 92, ­65–100. ​­ Solt, S., & Gotzner, N. (2012, May). Experimenting with degree. Proceedings of SALT, 22(166187), ­ 353364. Spector, B. (2013). Bare numerals and scalar implicatures. Language and Linguistics Compass, 7, 273–294. ­ ​­ van Tiel, B., Franke, M., & Sauerland, U. (2021). Probabilistic pragmatics explains gradience and focality in ­ e2005453118. natural language quantification. Proceedings of the National Academy of Sciences, 118(9), Tieu, L., Bill, C., Romoli, J., & Crain, S. (2014, August). Plurality inferences are scalar implicatures: Evidence ​­ from acquisition. Proceedings of Semantics and Linguistic Theory, 24, ­122–136. Waldon, B., & Degen, J. (2020). Modeling behavior in truth value judgment task experiments. Proceedings ­ ­10–19. ​­ of the Society for Computation in Linguistics, 3(1), Xiang, M., Kennedy, C., Xu, W., & Leffel, T. (2022). Pragmatic reasoning and semantic convention: A case ­ ​­ study on gradable adjectives. Semantics and Pragmatics, 15, Article 9, 1–63.

84

6 EXPERIMENTAL PRAGMATICS Tomoko Matsui

6.1

Introduction

Since the publication of the seminal work on utterance meaning by Grice (1975), the ultimate goal of pragmatics has been to describe and explain how the hearer understands the speaker’s intentions in verbal communication. For the hearer to understand the speaker’s intentions, decoding linguistic information in the utterance is merely a possible starting point. This is because most often the only small subpart of what the speaker intended to communicate to the hearer is expressed via linguistic information. Typically, the most important information is left unsaid. The hearer is required to infer, or “work out” according to Grice, what the speaker intended to communicate based on verbal and non-verbal clues such as facial expression, gesture, tone of voice, and a set of background knowledge (i.e., contextual information). In this way, according to Grice, sentence meaning (what is said) is different from the speaker’s meaning (what is communicated). The speaker’s meaning includes not only what is said (linguistic meaning) but also what is implicated, which Grice named “implicature”. This inferential model of communication introduced by Grice has been further developed by contemporary pragmatic theories, of which two contrastive approaches are dominant in experimental pragmatics: neo-Gricean (Horn, 1984, Levinson, 2000) and post-Gricean approaches (Sperber & Wilson, 1986, 1995). The most crucial difference between neo-Gricean and postGricean approaches lies in their claim as to how conversational implicatures arise. For example, neo-Griceans claim that scalar implicatures (for example, “some” implicates “not all” as in “some of the students passed the exam.”) is generated based on lexical scales (for example, “some-all”) while post-Griceans propose that implicatures, including scalar implicature, are generated by context-based inferences and deny any role of lexical scales in the process. In experimental pragmatics, one of the basic assumptions is that in comprehension of utterances, decoding linguistic information in an utterance and generating inferences based on contextual clues both require processing costs. In other words, it is assumed that context-based generation of implicature requires extra processing costs in utterance interpretation. In a typical psychological experiment of sentence comprehension, extra processing costs required is demonstrated by an increase in reading or reaction time. As for processing costs, neo-Gricean approaches predict that the generation of scalar implicatures based on lexical scales is automatic and default, and hence 85

DOI: 10.4324/9781003392972-8

Tomoko Matsui

does not require any extra processing cost. Post-Gricean approaches, by contrast, predict that the generation of scalar implicatures is as costly as any other implicatures generated based on contextual information. By measuring and analysing the time taken to read the target sentence including “some”, for example, an experimental study of scalar implicatures might be able to suggest which prediction the experimental results support more strongly. Experimental methods are also effective in investigating the development of pragmatic abilities. A post-Gricean approach, namely, relevance theory, hypothesises that pragmatic interpretation of utterances is an exercise in mind-reading to attribute speaker’s intentions inferentially, and rejects any role in a code-based or default-based interpretation suggested by, for example, neoGricean approaches (Sperber & Wilson, 2002). Accordingly, as for pragmatic abilities, relevance theory assumes that it is based on the mind-reading ability targeted at attributing speaker’s intentions, which is likely to be functionally distinct from, but also closely related to, the mind-reading ability used in non-communicative domains (such as inferential attribution of the actor’s mental states as in the false belief tasks). From early studies in the 1980s, most developmental studies on pragmatics have investigated children’s pragmatic abilities in relation to their theory of mind development (i.e., the mind-reading ability in non-communicative domain). They suggest robust correlations between the levels of sophistication in pragmatic interpretation and the orders of complexity in mind-reading ability. For example, children’s second-order (false) belief reasoning was found to correlate with their understanding of the speaker’s beliefs and intentions in deception (Winner & Leekam, 1991). However, the same study also reports that the same correlation was not found between children’s secondorder (false) belief reasoning and understanding of the speaker’s intentions and attitudes in verbal ironies. In a similar vein, Peterson et al. (2012) suggests that interpretation of the speaker’s intentions and attitudes in verbal irony is more difficult than second-order false belief understanding. The hypothesis raised by relevance theory that the mind-reading ability used to infer speaker’s intentions in utterance interpretation is distinct from the mind-reading ability used in noncommunicative domain has not been tested so far in experimental pragmatics. The findings of existing studies mentioned above, however, suggest that utterance interpretation is likely to require something more than higher-order belief reasoning ability, for example, tested in false belief tasks. In this chapter, after touching upon the historical perspective, I will consider some critical issues in experimental pragmatics through the discussion of three concrete topics: (a) the generation of conversational implicature, (b) utterance interpretation and mind-reading, and (c) development and disorder of pragmatic interpretation of utterances. The discussion will naturally link to the review of current contributions and research later, which also covers main research methods.

6.2

Historical perspectives

Experimental pragmatics has grown out of experimental studies on generation of pragmatic inferences in text comprehension (Graesser et al., 1994; McCoon & Ratcliff, 1992) and theoretical accounts of pragmatic interpretation of utterances (Grice, 1989; Horn, 1984; Searle, 1969, 1975; Sperber & Wilson, 1986, 1995). In 1970s and 1980s, many experimental studies on text comprehension focused on investigating processes of pragmatic inference generation in different aspects of utterance interpretation, including indirect speech acts (Clark, 1979; Clark & Lucy, 1975; Gibbs, 1979, 1981), reference assignment (Clark, 1977; Garrod & Sanford, 1982), and figurative languages (Gildea & Glucksberg, 1983; Glucksberg et al., 1982). Investigation on interpretation of quantifiers in everyday language has also started in the 1980s (Moxey & Stanford, 1986).

86

Experimental pragmatics

The main questions addressed in early experimental studies on inference generation in text comprehension included the following: does the generation of context-based pragmatic inferences require extra processing costs? Are literal interpretations of utterances processed first as default before non-literal interpretations being derived? The results of reading time experiments on comprehension of indirect requests and metaphorical expressions in the 1970s and 1980s, for example, suggest that given appropriate contextual information, the non-literal interpretation is derived as quickly as the literal interpretation of the same utterance (e.g., Gibbs, 1983). The findings like this were seen to deny the assumptions (a) that the literal interpretation of an utterance is processed first as default and (b) that context-based pragmatic inferences involved, for example, in the nonliteral interpretation of utterances, requires extra processing costs. As for experimental studies on development of pragmatic abilities, many important studies on children’s understanding of mental states were published in the 1980s. In the early 1980s, some of the first studies on children’s understanding of first-order (false) beliefs were published (e.g., Wimmer & Perner, 1983) and they prompted many other studies on development of theory of mind. The relation between language and theory of mind development has also been a central topic of investigation from early on and studies on children’s use of mental state terms have started around the same time (Bretherton & Beeghly, 1982; Shatz, Wellman & Silber, 1983). On the other hand, publication of Grice’s account of everyday conversation (1975) prompted early investigation on young children’s pragmatic abilities. For example, some studies tested children’s understanding of the distinction between saying and meaning (Beal & Flavell, 1984; Mitchell & Russell, 1989) and of verbal ambiguities (Beal & Flavell, 1982; Robinson & Robinson, 1982; Robinson & Whittaker, 1985; Sodian, 1988). Children’s understanding of non-literal use of language was also investigated: comprehension of indirect requests (Bernicot & Legros, 1987), metaphors (Dent, 1987; Winneret al., 1980; Vosniadou & Ortony, 1983), ironical utterances (Ackerman, 1983; Demorest et al., 1984), and deception (Lewis, Stanger & Sullivan, 1989), to name a few. Investigation on atypical development of pragmatic abilities in children with autism spectrum disorder (ASD) also started in the 1980s and increased in the 1990s (Baron-Cohen, 1988). Verbal children with ASD who had intact syntactic and semantic abilities were found to struggle with pragmatic interpretation of utterances (Tager-Flusberg, 1981). Non-verbal communication, for example, with pointing and eye contacts, was also found to be problematic with children with ASD (Mundy et al., 1986). Their difficulty with pragmatic interpretation of utterances has been linked to their impairment in inferential attribution of mental states to the speaker (Baron-Cohen et al., 1985; Happe, 1993), and weak central coherence (Frith, 1989; Happe, 1997), among others.

6.3

Critical issues and topics

Among many individual and specific issues that have been investigated in experimental pragmatics so far, as mentioned in Introduction, I will focus on the following main issues in this chapter: a The generation of conversational implicature: How conversational implicatures are generated in utterance interpretation and what cognitive resources are required for the generation? b Utterance interpretation and mind-reading ability: In what way is the attribution of speaker intentions in utterance interpretation related to general mind-reading abilities? c Development and disorders of pragmatic abilities: How does pragmatic interpretation of utterances in children and adolescents differ from that of adults? How does it relate to children’s linguistic and cognitive abilities? What are the characteristics of so-called pragmatic disorder?

87

Tomoko Matsui

In the following section, I will discuss the main issues above in detail under three separate topics. The first topic, understanding what is implicated by the utterance, directly addresses the first issue. The second topic, verbal irony, discusses relation between irony comprehension and mind-reading abilities, and hence relates to the second and the third issues. The third topic, ‘early pragmatic development and disorder’ is also related to the third issue above, and discusses pragmatic interpretation which may not require functional mind-reading abilities.

6.3.1

Understanding what is implicated by the utterance

Grice (1975) proposed that when interpreting utterances, the hearer needs to understand not only the explicit content of the utterance (“what is said”) but also the implicit content (“what is implicated”). Grice named the implicit content of the utterance “implicature”. In the following conversation between a mother and the child, the child states that he has been having stomach-ache, but what he wants to tell the mother in this context is that he hasn’t finished his homework. In other words, the child implicated that he hasn’t finished his homework because he had stomach-ache, which prevented him from finishing it. ­1 Mother: Have you finished homework? Child: I have been having stomach-ache. According to Grice, implicatures are classified into two different types. The first, “particularised conversational implicature”, is derived based on a set of contextual information which the speaker expected the hearer should use in interpreting the particular utterance. The implicature of the child’s utterance in (1) is an example of particularised conversational implicature. In (1), for the mother to generate the implicature that he hasn’t finished his homework, they use the general knowledge that having stomach-ache makes it difficult or impossible to do homework. The same utterance “I have been having stomach-ache” may communicate radically different particularised conversational implicature in a different set of contextual information. In (2), Tim asks his sister Hilary about the dinner in the previous evening and Hilary’s answer implicates that the dinner was not good at all because it is the likely cause of the stomach-ache she is having. Here for Tim to interpret what Hilary’s utterance implicates, a different set of contextual information including the general knowledge about dinner and stomach-ache needs to be used. ­2 tiM: How was the dinner at the new restaurant last evening? hilary: I have been having stomach-ache. Unlike the first type of implicature, the second type of implicature, “generalised conversational implicature”, is derived on the basis of certain linguistic items used in an utterance. For example, the utterance in (3) implicates that Jackie did not sell all the vegan cakes in a fundraising event using the word “some”. 3 Jackie sold some vegan cakes in a fundraising event. During the last decade, the question of how generalised conversational implicature is derived has been one of the central topics of experimental pragmatics. The particular types of generalised conversational implicature called “scalar implicature” such as “some” in the examples above, are most frequently discussed in the literature. For example, use of the word “some” in an utterance 88

Experimental pragmatics

implicates “some but not all” as in (3). Logically, however, “some” is entailed by “all” and therefore, the first and the second utterance in (4) do not contradict each other: 4 Jackie sold some vegan cakes in the fundraising event. In fact, she sold all of them. For the same reason, the utterance in (5) is logically true as it does not contradict with (6): 5 Some elephants are mammals. 6 All elephants are mammals. The relationship that “all” entails “some” indicates that “all” is a stronger term than “some”. Such a lexical relation is considered to form a linguistic scale such as , where the stronger item implies the weaker item, and the weaker item implicates the negation of the stronger item. According neo-Gricean approaches to pragmatics (Horn, 1972; Levinson, 2000), on the other hand, the utterance in (5) is pragmatically infelicitous because scalar implicature generated by use of “some” (“Not all elephants are mammals”) negates (6). The element negated by scalar implicature as in (6) is typically called an “alternative”. For example, “some” is entailed by “all”, which has a stronger meaning, and therefore “all” should be considered automatically as a default alternative to “some” and negated in the implicature. Levinson proposes that only after the scalar implicature (the negation of the stronger term of the lexical scale) is generated and judged infelicitous to be cancelled, then semantic (literal) interpretation of “some” (possibly all) becomes available and considered as a possible interpretation. According to Levinson, both lexical scales such as and scalar implicatures are automatically accessed in the interpretation of “some”, and hence no extra processing cost is required for the generation of scalar implicatures. Post-Gricean accounts such as relevance theory (Carston, 1997; Sperber & Wilson, 1986, 1995), by contrast, rejects the neo-Gricean proposal that scalar inference is automatically generated based on lexical triggers such as “some” to yield a default interpretation. Instead, relevance theory suggests that interpretation of lexical triggers of scalar inferences “some” or “or” is carried out as a general pragmatic process of lexical adjustment based on contextual information. Lexical adjustment is considered as part of explicit, rather than implicit, content of an utterance. According to relevance theory, linguistic information of an utterance needs to be adjusted inferentially to yield the meaning intended by the speaker. So unlike Grice who proposed that the implicit content of an utterance needs to be inferentially generated, relevance theory claims that the explicit content of an utterance also needs to be inferentially adjusted to yield the intended interpretation. Any lexical item typically has more than one potential meaning and lexical adjustment based on contextual information is needed to single out the one intended by the speaker. Lexical adjustment, therefore, is a general pragmatic process administered on any lexical items in utterance interpretation, rather than specific process applicable only to items that are part of lexical scales. Unlike the neo-Gricean hypothesis that scalar inference is an automatic and default process and hence does not require any extra processing cost, post-Gricean approach assumes that inferential adjustment of lexical meaning is a costly process.

6.3.2

Verbal irony and mind-reading ability: Understanding tacitly communicated intentions and attitudes

Traditionally verbal irony has been regarded as a figurative speech or rhetorical device that means the opposite of what is actually said (literal meaning). Through verbal irony, the speaker tacitly 89

Tomoko Matsui

expresses a mocking or critical attitude towards what is referred to in the utterance. The goal of experimental pragmatics is to explain how verbal irony is understood by investigating what types of abilities are required and when and how those abilities are involved in the online comprehension of ironical utterances. Comprehension of ironical utterances is characterised by the recognition of a substantial gap between what the speaker said and what the speaker intends to communicate by the utterance. Typically, the speaker of an ironic utterance implicitly highlights the contrast between them, using particular tone of voice, facial expressions and gestures such as head shaking (Bryant & Fox Tree, 2005; Utsumi, 2000). Once the hearer has recognised the incongruity between what he had expected to hear from the speaker and what he heard, the next phase of irony comprehension, geared towards filling the gaps begins. Pragmatic accounts of irony comprehension are constructed based on how the hearer’s recognition of incongruity during online comprehension will yield an ultimate understanding of the speaker’s attitude and intentions (Kreuz & Glucksberg, 1989; KumonNakamura et al., 1995; Wilson & Sperber, 1992). Early studies on online comprehension of ironical utterances in adults have shown that contextual information facilitates interpretation of irony. For example, given appropriate contextual information, the reading time for ironical utterances was comparable to that for similar literal counterparts (Gibbs, 1986). Later studies, on the other hand, have demonstrated that relatively more conventional, or familiar, ironies are as easily comprehended as similar literal utterances and so require less processing effort compared with unconventional, less familiar, ironies (Fein et al., 2015; Filik et al., 2014; Giora et al., 2007). Existing studies on children’s comprehension of irony suggest that adult-like understanding is achieved relatively late in development, sometime during middle childhood and adolescence. Developmental studies on irony comprehension explain children’s comprehension of irony in terms of their higher-order mental state understanding. Irony comprehension is considered to require not a single psychological process but several different processes each of which requires different socio-cognitive abilities. For example, Hancock et al. (2000) suggest a two-stage model of comprehension of irony based on Ackerman’s proposal (Ackerman, 1983). In the first stage, the hearer understands the speaker’s belief about the situation under discussion. Then in the second stage, the hearer infers the speaker’s pragmatic intent or attitude. The authors suggest that the first process requires the first-order mind-reading ability and the second process the second-order equivalent. Existing research on development of mind-reading ability (or “theory of mind”) has revealed that by five years of age, children acquire the concept of belief and can think and talk about not only true beliefs, but also beliefs that are false (Bartsch & Wellman, 1995). A sequential order is assumed in development of mind-reading ability: children come to understand first-order false belief around five years of age, and second-order false belief sometime between six and eight years (Perner & Wimmer, 1985). Hancock and colleagues claim that there is also a sequential order in understanding of the speaker’s beliefs, intentions, and attitudes behind ironical utterances, just like the sequential order assumed in development of mental state understanding. A causal relation between the two processes was also suggested by Hancock and colleagues: that accurate understanding of the speaker’s belief about the situation enables the hearer to infer the speaker’s intent or attitude. Recent studies confirm the suggested relation between the first-order mind-reading ability and understanding of the speaker’s belief about the situation in ironical utterances. Children understand that the speaker of an ironic utterance does not believe what the utterance describes by seven years of age (Filippova & Astington, 2008). Children’s understanding of a speaker’s belief 90

Experimental pragmatics

precedes understanding of the speaker’s intent in ironical utterances (­Pexman & Glenwight, 2007). The relation between ­second-​­order ­mind-​­reading ability and understanding of a speaker’s intent or attitude communicated by ironical utterances has been demonstrated (­Happe, 1993; Sullivan et al., 1995; Winner & Leekam, 1991). A few studies report, however, that children over nine years old have tendency to interpret ironical utterances as lies (­Demorest et al., 1984). Winner and Leekam (­1991) suggest that children find the speaker’s ­second-​­order intention behind lies (­i.e., the speaker intends that the hearer believes P) easier to understand than the s­econd-​­order intention behind irony (­i.e. the speaker intends that the hearer does not believe P). The question of why this is the case, however, has not been fully addressed so far.

6.3.3  Early pragmatic development and disorder Utterance interpretation requires mental state reasoning such as inferring the speaker’s intentions, beliefs, and attitudes. As mentioned in the previous section, in the case of irony comprehension, mental state reasoning required is particularly complicated and young children typically fail to understand the intended message and even adults’ interpretation is not infallible. Research on the development of mental state reasoning suggests that ­adult-​­like sophisticated mental state reasoning takes many years to mature. If so, abilities to interpret utterances with a­ dult-​ ­like sophistication may come only after h­ igher-​­order mental state reasoning is fully developed around the adolescent period. This does not mean, however, that children’s pragmatic abilities are not functional before adolescence. Here let us first look at early pragmatic abilities in children. Then we will discuss developmental disorder which causes difficulty in pragmatic interpretation of utterances. The earliest pragmatic ability that appears to be functioning during the infantile period is recognition of the speaker’s communicative intention (­Csibra, 2010; Csibra & Gergely, 2009). According to Senju and Csibra (­2008), by month months, infants show sensitivity to the speaker’s communicative intentions expressed by eye gaze or tone of voice. This sensitivity is crucial for infants’ early word learning: infants are willing to learn from the speaker who has shown ostensive communicative intentions. Between 9 months and 12 months, infants become capable of recognising where/­what the speaker wants him/­her to pay attention to, which is indicated by the speaker’s eye gaze or pointing. This is called “­joint attention” in developmental psychology, and considered to be a precursor to the ability for mental state reasoning (­theory of mind). Joint attention promotes infants’ word learning and leads to a sudden increase of vocabulary, the s­o-​­called vocabulary spurts, which occurs around 18 months. By 24 months, children are capable of taking other’s visual perspectives and possibly intentions (­Moll & Tomasello, 2006). Developing ­perspective-​­taking ability is essential in children’s referential communication. ­Two-­​­­year-​­old children are capable of identifying the referent of potentially ambiguous pronouns on the basis of discourse prominence (­Song & Fisher, 2005, 2007). By three years of age, children are able to suppress their own egocentric perspective and take the speaker’s visual perspective into consideration in order to identify the referent of ambiguous referring expressions (­Nilsen & Graham, 2009). Children’s developing executive functional ability, particularly inhibitory control, was crucial in suppressing the egocentric perspective. As for s­ chool-​­aged children, Nadig and Sedivy (­2002) report that ­six-­​­­year-​­olds were able to take the visual perspective of the speaker to identify the referent of a complex noun phrase. During the preschool period, young children demonstrate a variety of early pragmatic abilities. For example, ­three-­​­­year-​­olds can understand what is communicated in indirect answers to questions (­Schulze et al., 2013). During preschool years, children also demonstrate their inchoate capability 91

Tomoko Matsui

for comprehending scalar implicatures (Kampa & Papafragou, 2020; Papafragou & Musolino, 2003; Pouscoulous et al., 2007) and metaphorical expressions (Özçalışkan, 2007; Pouscoulous & Tomasello, 2020; Rubio-Fernández & Grassmann, 2016). It is important to note, however, that preschoolers’ pragmatic interpretation of utterances depends on their developing conceptual, cognitive, and linguistic abilities as well as mental state reasoning capability (Zufferey, 2015). It takes many more years to accomplish adult-like sophisticated understanding of the speaker’s intentions and attitudes communicated through utterances. Now let us consider cases in which difficulties in pragmatic interpretation of utterances persist throughout a lifetime. ASD is a developmental disorder whose primary symptoms include difficulties with social communication. During the infantile period, children with ASD do not seem to understand communicative intentions of the speaker expressed by eye gaze or tone of voice. They are typically not interested in social communication and are not motivated to look at other’s faces or to listen to other’s voices. Studies have shown that children with ASD do not acquire joint attentional skills and it is one of the causes for delay in language development. Existing research has shown that verbal children with ASD pass false belief tasks around nine years of age, much later than typically developing children (TD) who pass the task by five years (Happe, 1995). Both for TD and ASD children, developing mental state reasoning is found to be correlated with language ability. In the case of children with ASD, delay in language development may be one of the reasons why they pass the false belief test rather late. Highly verbal children with ASD are known to face a variety of communication problems in their daily lives. Their difficulty in understanding the speaker’s intentions and attitudes stems from difficulty in mental state reasoning as well as its precursors such as joint attention and visual perspective taking. Their difficulty with inferring mental states of the speaker leads them to further problems: they cannot work out what are the intended contexts and so cannot use them to interpret an utterance. Even a fairly simple process of interpreting referring expressions requires contextual information to single out the intended referent among potential candidates. Verbal children with ASD typically have difficulty in understanding what a pronoun or a definite noun phrase refers to. Their difficulty in identifying intended referent is found to be related to failure in taking other’s visual perspective and poor executive function skills (Nilsen & Graham, 2009). They also have difficulty in understanding the intended meaning of homonyms/homographs as well as metaphors (Happe, 1997). Their difficulty in understanding the intended meaning of potentially ambiguous lexical items or non-literal language is explained by poor mental state reasoning and weak central coherence (Happe & Frith, 2006). On the other hand, studies report that verbal adolescents with ASD understand scalar implicature equally well with their typically developing peers (Chevallier et al., 2011). Furthermore, verbal children with ASD are reported to be capable of understanding indirect requests (Kissine et al., 2015). Adults with ASD are also reported to be able to understand indirect requests equally well as the neurotypical counterpart, while comprehension of ironical utterances was still difficult for them (Deliens et al., 2018). These findings suggest that verbal children and adults with ASD have some pragmatic abilities to infer the intended interpretations. Recent studies, however, indicate that successful interpretation of scalar implicature and indirect requests in verbal children with ASD is unlikely to be related to Gricean mental state reasoning (Hochstein et al., 2018; Marocchini et al., 2022). In other words, the possibility has been suggested (a) that some pragmatic processes in utterance interpretation does not require full-blown mental state reasoning and (b) that verbal children with ASD are able to work out the intended interpretation in those specific pragmatic processes without resorting to any mental state attribution to the speaker.

92

Experimental pragmatics

6.4

Current contribution and research 6.4.1

Scalar implicature

During the last two decades, experimental research on scalar implicature has blossomed. They have been mostly experiments with adults, but some experiments tested typically developing children as well as children with ASD, as mentioned in Section 6.3.3. Here, I will discuss experiments with adults to illustrate different experimental methods.

6.4.1.1

Reaction time experiment

With a series of reaction time experiments, Bott and Noveck (2004) tested two contrastive pragmatic accounts of scalar implicatures. One is a default account proposed by Levinson (2000): it predicts that scalar inference is produced automatically by default and that only when it is judged infelicitous during later pragmatic processes, it is cancelled. The other is a context-based account built on relevance theory which predicts that scalar implicature is produced only when the contexts warrant it to make the interpretation relevant (Sperber & Wilson, 1995). In terms of processing cost, the default account assumes that generation of scalar inference does not require any extra cost while the context-based account suggests that producing scalar inference is costly. One of the experiments in Bott and Noveck (2004) measured the time spent for verification of categorical statements such as (5). In one condition, participants were instructed to interpret “some” in a statement such as (5) to mean “some and possibly all”. In another condition, they were asked to interpret “some” to mean “some but not all” (scalar inference). The default model would predict that when the participants were asked to interpret “some” to mean “some but not all”, they need less time to verify the statement than when they are asked to interpret it to mean “some and possibly all” where denial of the scalar inference, which is time-consuming, is involved. The result showed that participants took significantly longer to verify the categorical statements such as (5) when they were asked to interpret “some” to mean “some but not all”. The finding didn’t support Levinson’s proposal that scalar inference is produced automatically as default but endorses the context-based account based on relevance theory.

6.4.1.2

Reading time experiment

Breheny et al. (2006) investigated the role of context in the generation of scalar implicature by measuring the reading times of texts including a trigger of scalar inference “or” (scalar inference being “either A or B but not both”). In their first self-paced reading experiment, participants read short texts such as (7) (English translation of the Greek texts used in the task) segment-by-segment (indicated by slashes): 7a John was taking a university course/and working at the same time./For the exams/he had to study/from short and comprehensive sources./ Depending on the course, /he decided to read/the class notes or the summary./ 7b John heard that/the textbook for Geophysics/was very advanced./Nobody understood it properly./He heard that/if he wanted to pass the course/he should read/the class notes or the summary./ In (7a), preferred interpretation of “or” is exclusive or upper-bound (“either A or B but not both”) in which the generation of scalar implicature is required, and by contrast, in (7b), it is inclusive or

93

Tomoko Matsui

­

 ­

​­

94

Experimental pragmatics

­Figure  6.1 Examples of visual-world displays for (A) some/two trials and (B) all/three trials in Experiment 1 from Huang and Snedeker (2009).

95

Tomoko Matsui

a lot of practice (experimenter places a blank card next to the boy on the lower-left and three soccer balls next to the girl on the lower-right on the monitor screen). 9 Point to the girl that has some/all/two/three of the socks. The authors point out that instructions such as (9) contain an initial period of ambiguity in which the semantics of “some” was compatible with two characters on display. The question was whether this ambiguity could be resolved immediately or in a later phase by a pragmatic implicature. The time-course of participants’ eye gaze in their first experiment (see Figure 6.2) suggested that fixation of the correct target was substantially delayed when the instruction contained “some”. The authors explained that the delay was caused by the generation of scalar implicature at a later stage of online comprehension and suggest that there is a lag between the immediate semantic, or literal, interpretation and the later pragmatic, or enriched, interpretation. The finding, however, was challenged more recently by Grodner et al. (2010) and Degen and Tanenhaus (2016) who also used the visual-world paradigm to investigate online comprehension of sentences containing lexical triggers of scalar implicatures. The results of their eye-tracking experiments indeed indicate, contrary to Huang and Snedeker (2009), that scalar implicature is generated at the earliest moment of interpretation without any delay relative to the literal interpretation of similar quantifiers. They argue that the timing of generation of scalar implicature as well as the speed of pragmatic interpretation of sentence is determined by relative accessibility of relevant contextual information as well as relative ease of integration of linguistic and contextual information to process the sentence. They suggest that if a sentence containing lexical triggers of scalar implicature increases processing time, it is not because generating scalar implicature itself is time-consuming, but because integrating its meaning with relevant contextual information requires additional time.

­Figure  6.2

The time-course of looks to Target for the four trial types (Experiment 1) from Huang and Snedeker (2009). ­

96

Experimental pragmatics

6.4.2

Neuropsychological approaches to irony comprehension

The last two decades have witnessed an unprecedented increase in research on the neural underpinning of irony comprehension. Here I discuss recent neuropsychological studies of adult irony comprehension to illustrate how neurological approaches to pragmatics shed light on mechanisms of irony comprehension.

6.4.2.1

Functional MRI study of irony comprehension

Unlike children with limited mind-reading capabilities, adults with sophisticated higher-order metarepresentational/mind-reading abilities can use them to understand tacitly communicated intentions and attitudes in ironical utterances. Recent functional MRI (fMRI) studies of irony comprehension confirmed that mind-reading ability is involved in online interpretation of ironical utterances in adults. Studies using fMRI to localise psychological functions of brain regions demonstrated activation of medial prefrontal cortex (MPFC), known to be a part of the neural basis of mind-reading, during the comprehension of irony (Shibata et al., 2010; Uchiyama et al., 2006; Wang et al., 2006). On the other hand, a recent fMRI study reports extensive activations of a higher-order mind-reading network during irony comprehension: the right and left temporoparietal junction (rTPJ, lTPJ), the MPFC and the precuneus (PC) (Spotorno et al., 2012). These findings as well as meta-analyses (Bohrn et al., 2012; Rapp et al., 2012; Reyes-Aguilar et al., 2018) confirm the hypothesis that comprehension of irony relies on the hearer’s ability to infer the speaker’s mental states, such as intentions, beliefs, and attitudes. In addition, other fMRI studies indicate that the left inferior frontal gyrus (lIFG), including Brodmann’s area (BA) 47, might be the brain region where mentalising and language processes interact during irony comprehension (Uchiyama et al., 2006). Only a few studies have investigated the neural basis of integrating prosodic information during the comprehension of irony. Matsui, Nakamura, Utsumi et al. (2016) conducted a fMRI experiment to investigate the neural underpinning of irony comprehension in an auditory modality. The behavioural data revealed that affective prosody (either positive or negative) facilitated interpretation of irony. A significant interaction between context and prosody was found: irony perception was enhanced when positive prosody was used in the context of a bad deed or, vice versa, when negative prosody was used in the context of a good deed. The corresponding interaction effect was observed in the rostro-ventral portion of the left inferior frontal gyrus corresponding to Brodmann’s area (BA) 47. One recent fMRI study investigated how context‒content incongruity and content-prosody incongruity in sarcasm are integrated during online comprehension. Nakamura, Matsui, Utsumi et al. (2022) report that the context‒content incongruity effect, representing what is uttered in a particular context, was demonstrated in the mentalising network including the anterior rostral zone of the medial prefrontal cortex (arMPFC), the right temporal pole (TP), and cerebellum. The content‒prosody incongruity effect, on the other hand, was related to activation in the bilateral amygdala. The finding is consistent with those of previous neuroimaging studies reporting the involvement of the amygdala in sarcasm-specific processes (Akimoto et al., 2014; Uchiyama et al., 2012). The interaction between these incongruity effects was found in the bilateral dorsolateral prefrontal cortex, extending to the inferior frontal gyrus (IFG) and the salience network, including the anterior insular cortex and the caudal part of the dorso-medial prefrontal cortex (DLPFC). ­

97

Tomoko Matsui

6.4.2.2

Electroencephalographic (EEG) study on irony processing

Processing of ironical utterances has also been examined by electroencephalographic (EEG) studies measuring event-related brain potentials (ERPs). Existing EEG studies suggest that two ERP components, the N400 and the P600, are commonly elicited during language comprehension. The N400 is a negative component which is typically observed between 400 and 500 ms after the onset of stimulus. The P600, on the other hand, is a positive component which is likely to occur in the time window of 500–900 ms. The functional interpretation of those components remains a matter of debate, but recent studies suggest that the N400 reflects context-sensitive lexical retrieval, while the P600 reflects integration processes in interpretation (Caillies et al., 2019; Delogu et al., 2019). The findings of existing EEG studies on irony processing are not consistent regarding the N400 component. Some studies found greater N400 amplitude for ironical utterances than sincere (or literal) ones (Katz, Blasko, & Kazmerski, 2004). Other studies, however, didn’t find such differences (Regel, Gunter, & Friederici, 2011; Spotorno et al., 2013). As for the P600 component, on the other hand, findings are more consistent. Spotorno et al. (2013) report that ironic utterances induced greater P600 amplitude than sincere ones. Regel et al. (2011) observed a similar P600 effect for irony processing regardless of modality (visual or auditory) and task (comprehension or passive reading). These findings indicate that the greater amplitude of the P600 component reflects more effortful pragmatic processes of integrating linguistic stimuli with contextual information in irony comprehension.

6.4.3

Development of epistemic vigilance

In everyday conversation, as a hearer, we have an expectation that the speaker is telling the truth. This expectation, however, is not always fulfilled in reality. Therefore, it is of vital importance to be able to avoid being misinformed. In their seminal paper, Sperber et al. (2010) proposed that both children and adults are equipped with a capacity for epistemic vigilance, i.e., a capacity to assess the speaker’s trustworthiness in order to avoid being misinformed. Over the past decade or so, however, research has shown that epistemic vigilance towards the source of information takes time to develop. Children’s early sensitivity to signs of the trustworthiness of the speaker is demonstrated by three years of age. They are sensitive to behavioural signs of the speaker’s knowledge states from early on and trust knowledgeable or accurate informants and dismiss the claim made by ignorant or inaccurate counterparts (e.g., Clément et al., 2004; Koenig et al., 2004). Children also become sensitive to the speaker’s attitude of certainty about the propositional content of the utterance at an early age (Matsui et al., 2009). For example, two-year-olds differentiate certain and uncertain speakers by non-verbal signs and imitate certain speakers more often than uncertain counterparts (Birch et al., 2010; Brosseau-Liard, & Poulin-Dubois, 2014). Their ability to assess the trustworthiness of the informant based on non-verbal indication of the speaker’s knowledge states becomes more successful and consistent as they get older (Koenig & Harris, 2005; Pasquini et al., 2007). Children are also sensitive to linguistic clues that indicate the trustworthiness of the source of information. For example, they can assess the speaker’s attitude of certainty based on linguistic clues (Matsui et al., 2016). Matsui et al. (2006) tested pre-schoolers (aged three to six years) with hidden object tasks in which children were to make decisions based on two conflicting utterances, each of which was marked with an expression of a different degree of speaker certainty and evidentiality. Children comprehended certainty contrasts better than evidentiality contrasts. In addition, they understood the speaker’s knowledge states better when they were conveyed by 98

Experimental pragmatics

particles than by verbs. Even three-year-olds had a somewhat good understanding of the particles of speaker certainty yo (certainty) and kana (uncertainty), but that their understanding of equivalent verbs (“I think” and “I know”) remained poor. Interestingly, children’s understanding of epistemic particles did not correlate with the children’s false belief understanding while their understanding of epistemic verbs did relate significantly to whether they pass false belief tasks. More recently, Courtney (2015) used a similar experimental paradigm as in Matsui et al. (2006) to test Quechua-speaking children’s assessment of relative reliability of contrasting statements. The study examined understanding of reliability of information in adults and children (three to six years) using Quechua epistemic and evidential markers. -Chά that encodes both epistemic meaning of uncertainty and evidential meaning and reasoning as information source was contrasted epistemic expressions of -puni (certainty) or -mi ​­ ​­ (certainty and direct evidence). It was predicted that if children understand the epistemic meaning of each morpheme, the statements with -cha ​­ should be assessed less reliably than the statements either with -puni ​­ or -mi. ​­ Children’s understanding of contrasting evidential expressions was also tested by a pair of statements: one with the combination of -mi ​­ and -ra ​­ (experienced past) and the other with the combination of -si ​­ and -sqa (non-experience past). The results showed that overall adults and older children (five to ​­ six years old) performed better than younger children (three to four years old). However, detailed analyses of the results demonstrated that unlike adults, children struggled to assess reliability of statements with contrasting evidential markers. As in the study of Japanese children in Matsui et al. (2006), Quechua-speaking children were good at assessing reliability of statements based on what is encoded by epistemic expressions earlier than doing so based on what is encoded by evidential expressions. Studies reviewed here suggest that the timing of children’s functional understanding of epistemic expressions varies according to the linguistic form of each expression (e.g., particles versus verbs). Children’s ability to make use of evidential expressions to assess the trustworthiness of information develop much later than their skills to make use of epistemic expressions. Furthermore, children’s understanding of linguistic markers of speaker certainty and evidentiality appear to be connected to their theory of mind development. Exactly how they are connected, however, is far from clear so far. Some recent studies have investigated the connection between use of evidential markers and the speaker’s knowledge states. Aksu-Koc et al. (2009) tested the prediction that children’s ability to use evidentials boosts their skill to make use of the source of knowledge held in memory. Turkish four-year-olds were asked to describe, comment on, or retell some events with appropriate evidentials (direct visual evidence, indirect hearsay evidence, and inference). They were also asked to identify the person from whom they acquired information a week later. Children’s ability to use -(i)mis (indirect hearsay evidence) predicted the ability to remember the source of information. The authors concluded that speaking a language with evidentials boosts the development of memory of the knowledge source. The influence of exposure to language with obligatory use of evidentials on children’s assessment of knowledge states of the speaker and theory of mind was investigated by Lucas et al. (2013). Turkish, Chinese, and English children’s ability to assess the trustworthiness of the speaker and false belief understanding was compared. The results showed that Turkish three- and four-yearolds, who were exposed to evidential language, were better than same-aged Chinese and English children both in assessment of the trustworthiness of the speaker and false belief understanding. The authors suggest that exposure to evidential language promoted Turkish children’s higher sensitivity to the speaker knowledge states and that such sensitivity in turn fostered their accurate assessment of trustworthiness of information and first-order belief reasoning. 99

Tomoko Matsui

6.5

Future directions

One of the main goals of experimental pragmatics is to investigate cognitive and social factors required for successful generation of pragmatic inferences and to examine how/when they are used in actual interpretation of utterances. As described in this chapter, experimental investigations of adults’ online processing of utterances are effective to test hypotheses about how/when pragmatic inferences are generated online as well as about factors that may affect the timing of the generation of pragmatic inferences. On the other hand, neuropsychological experiments on adults’ interpretation of figurative language such as verbal ironies, which is also covered in the chapter, are probably the best approach currently available to test hypotheses about cognitive and social abilities involved in interpretation of the speaker’s beliefs, attitudes, and intentions. The chapter has also demonstrated that experimental methods are also very fruitful to investigate development of pragmatic abilities, including epistemic vigilance. By testing and comparing pragmatic interpretation of utterances of different groups of children, for example, by age, language ability, and being with or without developmental disorders, different stages of pragmatic development based on children’s cognitive and social capabilities can be identified. As we can see in this chapter, there are quite a few questions that remain to be answered in the future research in experimental pragmatics. Here, to suggest a possible future direction of research, I would like to revisit one of the old questions of experimental pragmatics touched upon in the introduction: is the pragmatic interpretation of utterances an exercise in mind-reading ability through and through, or is there any room for a linguistically oriented, or a code-like, inferential system to contribute (such as lexical scales)? As mentioned in Section 6.3.3., studies investigating pragmatic abilities in high-functioning children and adolescents with ASD report that they can generate certain pragmatic inferences, for instance, scalar inferences and those required to interpret indirect requests. These findings indicate that difficulties in pragmatic interpretation in high-functioning children and adolescents with ASD may not be due to a uniform pragmatic impairment and prompted renewed discussions on a possibility that pragmatic inferences/competencies required for utterance interpretation may not be a uniform type after all. New hypotheses have already been suggested as to possible distinct types of pragmatic competencies and impairments. Andrés-Roqueta and Katsos (2020), for example, suggest two distinct types of pragmatic inferences: (a) inferences generated locally based on interaction between certain linguistic items and a set of contexts, and (b) those generated globally based of searching for the intended interpretation of utterance as a whole. They claim that the first type of pragmatic inferences is based on one’s linguistic competences (linguistic pragmatics), and the second type is based on social competences such as theory of mind (social pragmatics). The linguistic-pragmatic inferences are based on understanding of linguistic and social conventions. The social-pragmatic inferences, on the other hand, are based on mind-reading ability geared to identifying the speaker’s beliefs, attitudes, and intentions. They suggest that scalar inferences are the first type and inferences required to interpret ironical utterances are the second type, based on experimental results that high-functioning children with ASD could generate appropriate scalar inferences while struggled with interpretation of ironical utterances. Deliens et al. (2018), on the other hand, suggest a different set of distinct pragmatic competences: pragmatic inferences based on understanding of the speaker’s mental states versus those based on egocentric perspectives. They report that high-functioning adults with ASD, who struggled with pragmatic inferences that require understanding of the speaker’s mental states as in irony comprehension, demonstrated intact pragmatic competences to comprehend indirect requests. 100

Experimental pragmatics

Since the publication of Grice’s seminal paper in 1975, theoretical discussion on the existence of distinct types of pragmatic inferences has continued until now. Experimental approaches to pragmatics developed over the last two decades are playing an important role in shedding a definitive new light on the debate. The question of what the precise cognitive underpinning of possible distinct pragmatic inferences are, however, remains one of the important topics for further research.

Further reading Cummins, C., & Katsos, N. (Eds.). (2019) The Oxford Handbook of Experimental Semantics and Pragmatics. Oxford University Press. Noveck, I. (2018) Experimental Pragmatics: The Making of a Cognitive Science. Cambridge University Press.

Related topics Experimental studies in discourse, analysing the time course of language comprehension, analysing spoken language comprehension with eye-tracking, analysing language comprehension using ERP, analysing language using brain imaging, experimental methods to study child language, experimental methods to study atypical language development

References Ackerman, B. (1983). Form and function in children’s understanding of ironic utterances. Journal of Experimental Child Psychology, 35, ­487–508. ​­ Akimoto, Y., Sugiura, M., Yomogida, Y., Miyauchi, C. M., Miyazawa, S., & Kawashima, R. (2014). Irony comprehension: Social conceptual knowledge and emotional response. Human Brain Mapping, 35(4), ­ ­1167–1178. ​­ Aksu-Koç, A., Ögel-Balaban, H., & Alp, I. E. (2009). Evidentials and source knowledge in Turkish. New Directions for Child and Adolescent Development, 2009(125), ­ ­13–28. ​­ Andrés-Roqueta, C., & Katsos, N. (2020). A distinction between linguistic and social pragmatics helps the precise characterization of pragmatic challenges in children with autism spectrum disorders and developmental language disorder. Journal of Speech, Language, and Hearing Research, 63(5), ­ ­1494–1508. ​­ Bartsch, K., & Wellman, H. M. (1995). Children Talk about the Mind. Oxford university Press. Baron-Cohen, S., Leslie, A. M., & Frith, U. (1985). Does the autistic child have a ‘theory of mind’? Cognition, 21, ­37–46. ​­ Baron-Cohen S. (1988). Social and pragmatic deficits in autism: Cognitive or affective? Journal of Autism & Developmental Disorder, 18, ­379–402. ​­ Beal, C. R., & Flavell, J. H. (1982). Effect of increasing the salience of message ambiguities on kindergartners’ evaluations of communicative success and message adequacy. Developmental Psychology, 18(1), ­ 43. Beal, C. R., & Flavell, J. H. (1984). Development of the ability to distinguish communicative intention and literal message meaning. Child Development, ­920–928. ​­ Bernicot, J., & Legros, S. (1987). Direct and indirect directives: What do young children understand?. Journal of Experimental Child Psychology, 43(3), ­ ­346–358. ​­ Birch, S. A., Akmal, N., & Frampton, K. L. (2010). Two-year-olds are vigilant of others’ non-verbal cues to credibility. Developmental Science, 3(2), ­ 363–369. ­ ​­ Bohrn, I. C., Altmann, U., & Jacobs, A. M. (2012). Looking at the brains behind figurative language—A quantitative meta-analysis of neuroimaging studies on metaphor, idiom, and irony processing. Neuropsychologia, 50(11), ­ ­2669–2683. ​­ Bott, L., & Noveck, I. A. (2004). Some utterances are underinformative: The onset and time course of scalar inferences. Journal of Memory and Language, 51(3), ­ ­437–457. ​­ Breheny, R., Katsos, N., & Williams, J. (2006). Are generalised scalar implicatures generated by default? An on-line investigation into the role of context in generating pragmatic inferences. Cognition, 100(3), ­ ­434–463. ​­

101

Tomoko Matsui Bretherton, I., & Beeghly, M. (1982). Talking about internal states: The acquisition of an explicit theory of mind. Developmental Psychology, 18(6), ­ 906. Brosseau-Liard, P. E., & Poulin-Dubois, D. (2014). Sensitivity to confidence cues increases during the second year of life. Infancy, 19(5), ­ ­461–475. ​­ Bryant, G. A., & Fox Tree, J. E. (2005). Is there an ironic tone of voice? Language and Speech, 48(3), ­ ­257–277. ​­ Caillies, S., Gobin, P., Obert, A., Terrien, S., Coutté, A., Iakimova, G., & Besche-Richard, C. (2019). Asymmetry of affect in verbal irony understanding: What about the N400 and P600 components? Journal of Neurolinguistics, 51, ­268–277. ​­ Carston, R. (1997). Enrichment and loosening: Complementary processes in deriving the proposition ex­ ­ ​­ pressed? In Pragmatik (pp.103–127). VS Verlag für Sozialwissenschaften, Wiesbaden. Chevallier, C., Noveck, I., Happé, F., & Wilson, D. (2011). What’s in a voice? Prosody as a test case for the theory of mind account of autism. Neuropsychologia, 49(3), ­ ­507–517. ​­ Clément, F., Koenig, M., & Harris, P. (2004). The ontogenesis of trust. Mind & Language, 19(4), ­ ­360–379. ​­ Courtney, Ellen H. (2015). Child acquisition of Quechua evidentiality and deictic meaning. In M.Manley & A. Muntendam (Eds.). Quechua Expressions of Stance and Deixis (pp. Brill. ­­  ­101–144). ​­ Csibra, G. (2010). Recognizing communicative intentions in infancy. Mind & Language, 25(2), ­ ­141–168. ​­ Csibra, G., & Gergely, G. (2009). Natural pedagogy. Trends in Cognitive Sciences, 13(4), ­ ­148–153. ​­ Clark, H. H. (1977). Bridging. In P. N. Johnson-Laird and P. C. Wason (Eds.), Thinking: Readings in Cognitive Science (pp. Cambridge University Press. ­­  ­411–420). ​­ Clark, H. H. (1979). Responding to indirect speech acts. Cognitive Psychology, 11(4), ­ 430–477. ­ ​­ Clark, H. H., & Lucy, P. (1975). Understanding what is meant from what is said: A study in conversationally conveyed requests. Journal of Verbal Learning and Verbal Behavior, 14(1), ­ 56–72. ­ ​­ De Carvalho, A., Reboul, A. C., Van der Henst, J. B., Cheylus, A., & Nazir, T. (2016). Scalar implicatures: The psychological reality of scales. Frontiers in Psychology, 7, 1500. Degen, J., & Tanenhaus, M. K. (2016). Availability of alternatives and the processing of scalar implicatures: A visual world eye-tracking study. Cognitive Science, 40(1), ­ 172–201. ­ ​­ Deliens, G., Papastamou, F., Ruytenbeek, N., Geelhand, P., & Kissine, M. (2018). Selective pragmatic impairment in autism spectrum disorder: Indirect requests versus irony. Journal of Autism and Developmental Disorders, 48(9), ­ 2938–2952. ­ ​­ Delogu, F., Brouwer, H., & Crocker, M. W. (2019). Event-related potentials index lexical retrieval (N400). and integration (P600). during language comprehension. Brain and Cognition, 135, 103569. Demorest, A., Meyer, C., Phelps, E., Gardner, H., & Winner, E. (1984). Words speak louder than actions: Understanding deliberately false remarks. Child Development, 55, 1527–1534. ­ ​­ Dent, C. H. (1987). Developmental studies of perception and metaphor: The twain shall meet. Metaphor and Symbol, 2(1), ­ 53–71. ­ ​­ Fein, O., Yeari, M., & Giora, R. (2015). On the priority of salience-based interpretations: The case of sarcastic irony. Intercultural Pragmatics, 12(1), ­ 1–32. ­ ​­ Filik, R., Leuthold, H., Wallington, K., & Page, J. (2014). Testing theories of irony processing using eyetracking and ERPs. Journal of Experimental Psychology: Learning, Memory, and Cognition, 40(3), ­ 811. Filippova, E., & Astington, J. W. (2008). Further development in social reasoning revealed in discourse irony understanding. Child Development, 79(1), ­ 126–138. ­ ​­ Frith U. (1989). Autism: Explaining the Enigma. Blackwell Garrod, S. C., & Sanford, A. J. (1982). The mental representation of discourse in a focused memory system: Implications for the interpretation of anaphoric noun phrases. Journal of Semantics, 1(1), ­ 21–41. ­ ​­ Gibbs Jr, R. W. (1979). Contextual effects in understanding indirect requests. Discourse Processes, 2(1), ­ 1–10. ­ ​­ Gibbs Jr, R. W. (1981). Your wish is my command: Convention and context in interpreting indirect requests. Journal of Verbal Learning and Verbal Behavior, 20(4), ­ 431–444. ­ ​­ Gibbs, R. W. (1983). Do people always process the literal meanings of indirect requests? Journal of Experimental Psychology: Learning, Memory, and Cognition, 9(3), ­ 524. Gibbs, R. W. (1986). On the psycholinguistics of sarcasm. Journal of Experimental Psychology: General, 115(1), ­ 3. Gildea, P., & Glucksberg, S. (1983). On understanding metaphor: The role of context. Journal of Verbal Learning and Verbal Behavior, 22(5), ­ 577–590. ­ ​­

102

Experimental pragmatics Giora, R., Fein, O., Laadan, D., Wolfson, J., Zeituny, M., Kidron, R.,... & Shaham, R. (2007). Expecting irony: Context versus salience-based effects. Metaphor and Symbol, 22(2), ­ ­119–146. ​­ Glucksberg, S., Gildea, P., & Bookin, H. B. (1982). On understanding nonliteral speech: Can people ignore metaphors? Journal of Verbal Learning and Verbal Behavior, 21(1), ­ ­85–98. ​­ Graesser, A. C., Singer, M., & Trabasso, T. (1994). Constructing inferences during narrative text comprehension. Psychological Review, 101(3), ­ ­371–395. ​­ Grice, H. (1975). ‘Logic and conversation’, in P. Cole & J. Morgan (Eds.), Syntax and semantics, volume 3 (pp. Academic Press. ­­  ­41–58). ​­ Grice, H. P. (1989). Studies in the Way of Words. Harvard University Press. Grodner, D. J., Klein, N. M., Carbary, K. M., & Tanenhaus, M. K. (2010). “Some,” and possibly all, scalar inferences are not delayed: Evidence for immediate pragmatic enrichment. Cognition, 116(1), ­ 42–55. ­ ​­ Hancock, J. T., Dunham, P. J., & Purdy, K. (2000). Children’s comprehension of critical and complimentary forms of verbal irony. Journal of Cognition and Development, 1(2), ­ ­227–248. ​­ Happé, F. G. (1993). Communicative competence and theory of mind in autism: A test of relevance theory. Cognition, 48(2), ­ ­101–119. ​­ Happé, F. G. (1995). The role of age and verbal ability in the theory of mind task performance of subjects with autism. Child Development, 66(3), ­ ­843–855. ​­ Happé, F. G. (1997). Central coherence and theory of mind in autism: Reading homographs in context. British Journal of Developmental Psychology, 15(1), ­ ­1–12. ​­ Happé, F., & Frith, U. (2006). The weak coherence account: Detail-focused cognitive style in autism spectrum disorders. Journal of Autism and Developmental Disorders, 36(1), ­ ­5–25. ​­ Hochstein, L., Bale, A., & Barner, D. (2018). Scalar implicature in absence of epistemic reasoning? The case of autism spectrum disorder. Language Learning and Development, 14(3), ­ 224–240. ­ ​­ Horn, L. R. (1972). On the semantic properties of the logical operators in English (Unpublished doctoral dissertation). UCLA. Horn, L. R. (1984). ‘Toward a new taxonomy for pragmatic inference: Q-based and R-based Implicature’, In D. Schiffrin (Ed.), Meaning, Form, and Use in Context: Linguistic Applications (pp. ­­  ­11–42). ​­ Georgetown University Press. Huang, Y. T., & Snedeker, J. (2009). Online interpretation of scalar quantifiers: Insight into the semantics– ­ ­ ­376–415. ​­ pragmatics interface. Cognitive Psychology, 58(3), Kampa, A., & Papafragou, A. (2020). Four-year-olds incorporate speaker knowledge into pragmatic inferences. Developmental Science, 23(3), ­ e12920. Katz, A. N., Blasko, D. G., & Kazmerski, V. A. (2004). Saying what you don’t mean: Social influences on ­ ­186–189. ​­ sarcastic language processing. Current Directions in Psychological Science, 13(5), Kissine, M., Cano-Chervel, J., Carlier, S., De Brabanter, P., Ducenne, L., Pairon, M. C.,... & Leybaert, J. (2015). Children with autism understand indirect speech acts: Evidence from a semi-structured act-out task. PLoS One, 10(11), ­ e0142191. Koenig, M. A., Clément, F., & Harris, P. L. (2004). Trust in testimony: Children’s use of true and false state­ ­ ​­ ments. Psychological Science, 15(10), 694–698. Koenig, M. A., & Harris, P. L. (2005). Preschoolers mistrust ignorant and inaccurate speakers. Child Development, 76(6), ­ ­1261–1277. ​­ Kreuz, R. J., & Glucksberg, S. (1989). How to be sarcastic: The echoic reminder theory of verbal irony. Jour­ 374. nal of Experimental Psychology: General, 118(4), Kumon-Nakamura, S., Glucksberg, S., & Brown, M. (1995). How about another piece of pie: The allusional pretense theory of discourse irony. Journal of Experimental Psychology: General, 124(1), ­ 3. Levinson, S. C. (2000). Presumptive Meanings: The Theory of Generalized Conversational Implicature. MIT Press. Lewis, M., Stanger, C., & Sullivan, M. W. (1989). Deception in 3-year-olds. Developmental Psychology, 25(3), ­ 439. Lucas, A. J., Lewis, C., Pala, F. C., Wong, K., & Berridge, D. (2013). Social-cognitive processes in preschoolers’ selective trust: Three cultures compared. Developmental Psychology, 49(3), ­ 579. Marocchini, E., Di Paola, S., Mazzaggio, G., & Domaneschi, F. (2022). Understanding indirect requests for information in ­high-functioning ​­ autism. Cognitive Processing, 23(1), ­ ­129–153. ​­ Matsui, T., Yamamoto, T., & McCagg, P. (2006). On the role of language in children’s early understanding of others as epistemic beings. Cognitive Development, 21(2), ­ ­158–173. ​­

103

Tomoko Matsui Matsui, T., Rakoczy, H., Miura, Y., & Tomasello, M. (2009). Understanding of speaker certainty and falsebelief reasoning: A comparison of Japanese and German preschoolers. Developmental Science, 12(4), ­ 602–613. ­ ​­ Matsui, T., Nakamura, T., Utsumi, A., Sasaki, A. T., Koike, T., Yoshida, Y.,… & Sadato, N. (2016). The role of prosody and context in sarcasm comprehension: Behavioral and fMRI evidence. Neuropsychologia, 87, 74–84. ­ ​­ Matsui, T., Yamamoto, T., Miura, Y., & McCagg, P. (2016). Young children’s early sensitivity to linguistic indications of speaker certainty in their selective word learning. Lingua, 175, 83–96. ­ ​­ McKoon, G., & Ratcliff, R. (1992). Inference during reading. Psychological Review, 99(3), ­ 440–466. ­ ​­ Mitchell, P., & Russell, J. (1989). Young children’s understanding of the say-mean distinction in referential speech. Journal of Experimental Child Psychology, 47(3), ­ 467–490. ­ ​­ Moll, H., & Tomasello, M. (2006). Level 1 perspective-taking at 24 months of age. British Journal of Developmental Psychology, 24(3), ­ 603–613. ­ ​­ Moxey, L. M., & Sanford, A. J. (1986). Quantifiers and focus. Journal of Semantics, 5(3), ­ 189–206. ­ ​­ Mundy, P., Sigman, M., Ungerer, J., & Sherman, T. (1986). Defining the social deficits of autism: The contribution of non-verbal communication measures. Journal of Child Psychology and Psychiatry, 27, 657–666 ­ ​­ Nadig, A. S., & Sedivy, J. C. (2002). Evidence of perspective-taking constraints in children’s on-line reference resolution. Psychological Science, 13(4), ­ 329–336. ­ ​­ Nilsen, E. S., & Graham, S. A. (2009). The relations between children’s communicative perspective-taking and executive functioning. Cognitive Psychology, 58(2), ­ 220–249. ­ ​­ Nakamura, T., Matsui, T., Utsumi, A., Sumiya, M., Nakagawa, E., & Sadato, N. (2022). Context-prosody interaction in sarcasm comprehension: A functional magnetic resonance imaging study. Neuropsychologia, 170, 108213. Özçalışkan, Ş. (2007). Metaphors we move by: Children’s developing understanding of metaphorical motion in typologically distinct languages. Metaphor and Symbol, 22(2), ­ 147–168. ­ ​­ Papafragou, A., & Musolino, J. (2003). Scalar implicatures: Experiments at the semantics–pragmatics interface. Cognition, 86(3), ­ 253–282. ­ ​­ Pasquini, E. S., Corriveau, K. H., Koenig, M., & Harris, P. L. (2007). Preschoolers monitor the relative accuracy of informants. Developmental Psychology, 43(5), ­ 1216. Perner, J., & Wimmer, H. (1985). “John thinks that Mary thinks that…” attribution of second-order beliefs by 5-to children. Journal of Experimental Child Psychology, 39(3), ­ ​­ 10-year-old ­ ­​­­ ​­ ­ 437–471. ­ ​­ Peterson, C. C., Wellman, H. M., & Slaughter, V. (2012). The mind behind the message: Advancing theoryof-mind scales for typically developing children, and those with deafness, autism, or Asperger syndrome. Child Development, 83, 469–485. ­ ​­ Pexman, P. M., & Glenwright, M. (2007). How do typically developing children grasp the meaning of verbal irony? Journal of Neurolinguistics, 20(2), ­ 178–196. ­ ​­ Pouscoulous, N., Noveck, I. A., Politzer, G., & Bastide, A. (2007). A developmental investigation of processing costs in implicature production. Language Acquisition, 14(4), ­ 347–375. ­ ​­ Pouscoulous, N., & Tomasello, M. (2020). Early birds: Metaphor understanding in 3-year-olds. Journal of Pragmatics, 156, 160–167. ­ ​­ Rapp, A. M., Mutschler, D. E., & Erb, M. (2012). Where in the brain is nonliteral language? A coordinatebased meta-analysis of functional magnetic resonance imaging studies. Neuroimage, 63(1), ­ 600–610. ­ ​­ Regel, S., Gunter, T. C., & Friederici, A. D. (2011). Isn’t it ironic? An electrophysiological exploration of figurative language processing. Journal of Cognitive Neuroscience, 23(2), ­ 277–293. ­ ​­ Reyes-Aguilar, A., Valles-Capetillo, E.,  & Giordano, M. (2018). A quantitative meta-analysis ­ ​­ of neuroimag­ ​­ ­ ​­ ­ ing studies of pragmatic language comprehension: In search of a universal neural substrate. Neuroscience, 395, 60–88. ­ ​­ Robinson, E. J., & Robinson, W. P. (1982). Knowing when you don’t know enough: Children’s judgements about ambiguous information. Cognition, 12(3), ­ 267–280. ­ ​­ Robinson, E. J., & Whittaker, S. J. (1985). Children’s responses to ambiguous messages and their understanding of ambiguity. Developmental Psychology, 21(3), ­ 446. Rubio-Fernández, P., & Grassmann, S. (2016). Metaphors as second labels: Difficult for preschool children? Journal of Psycholinguistic Research, 45(4), ­ 931–944. ­ ​­ Schulze, C., Grassmann, S., & Tomasello, M. (2013). 3-year-old children make relevance inferences in indirect verbal communication. Child Development, 84(6), ­ 2079–2093. ­ ​­ Searle, J. R. (1969). Speech Acts: An Essay in the Philosophy of Language. Cambridge University Press.

104

Experimental pragmatics Searle, J. R. (1975). ‘Indirect speech acts’. In P. Cole & J. L. Morgan (Eds.), Speech Acts (pp. ­­  ­59–82). ​­ Academic Press. Senju, A., & Csibra, G. (2008). Gaze following in human infants depends on communicative signals. Current ­ 668–671. ­ ​­ Biology, 18(9), Shatz, M., Wellman, H. M., & Silber, S. (1983). The acquisition of mental verbs: A systematic investigation ­ ­301–321. ​­ of the first reference to mental state. Cognition, 14(3), Shibata, M., Toyomura, A., Itoh, H., & Abe, J. I. (2010). Neural substrates of irony comprehension: A func​­ tional MRI study. Brain Research, 1308, ­114–123. Sodian, B. (1988). Children’s attributions of knowledge to the listener in a referential communication task. ­ ­378–385. ​­ Child Development, 59(2), Song, H. J., & Fisher, C. (2005). Who’s “she”? Discourse prominence influences preschoolers’ comprehen­ 29–57. ­ ​­ sion of pronouns. Journal of Memory and Language, 52(1), Song, H. J., & Fisher, C. (2007). Discourse prominence effects on 2.5-year-old children’s interpretation of ­ ­1959–1987. ​­ pronouns. Lingua, 117(11), Sperber, D., and Wilson, D. (1986/1995). Relevance: Communication and Cognition. Basil Blackwell. ­­ ​­ ­3–23. ​­ Sperber, D., & Wilson, D. (2002). Pragmatics, modularity and mind-reading. Mind & Language, 17(1–2), Sperber, D., Clément, F., Heintz, C., Mascaro, O., Mercier, H., Origgi, G., & Wilson, D. (2010). Epistemic ­ ­359–393. ​­ vigilance. Mind & Language, 25(4), Spotorno, N., Koun, E., Prado, J., Van Der Henst, J. B., & Noveck, I. A. (2012). Neural evidence that ­ ­25–39. ​­ utterance-processing entails mentalizing: The case of irony. NeuroImage, 63(1), Spotorno, N., Cheylus, A., Van Der Henst, J. B., & Noveck, I. A. (2013). What’s behind a P600? Integration ­ e66839. operations during irony processing. PLoS One, 8(6), Sullivan, K., Winner, E., & Hopfield, N. (1995) How children tell a lie from a joke: The role of second-order mental state attributions. British Journal of Developmental Psychology, 13, 191–204. Tager-Flusberg, H. (1981). On the nature of linguistic functioning in early infantile autisnm. Journal of ​­ Autism and Developmental Disorders, 11, ­45–56. Uchiyama, H., Seki, A., Kageyama, H., Saito, D. N., Koeda, T., Ohno, K., & Sadato, N. (2006). Neural sub­ 100–110. ­ ​­ strates of sarcasm: A functional magnetic-resonance imaging study. Brain Research, 1124(1), Uchiyama, H. T., Saito, D. N., Tanabe, H. C., Harada, T., Seki, A., Ohno, K.,... & Sadato, N. (2012). Distinction between the literal and intended meanings of sentences: A functional magnetic resonance imaging ­ ­563–583. ​­ study of metaphor and sarcasm. Cortex, 48(5), Utsumi, A. (2000). Verbal irony as implicit display of ironic environment: Distinguishing ironic utterances ­ ­1777–1806. ​­ from nonirony. Journal of Pragmatics, 32(12), Vosniadou, S., & Ortony, A. (1983). The emergence of the literal-metaphorical-anomalous distinction in ­ ­154–161. ​­ young children. Child Development, 54(1), Wang, A. T., Lee, S. S., Sigman, M., & Dapretto, M. (2006). Neural basis of irony comprehension in children ­ ­932–943. ​­ with autism: the role of prosody and context. Brain, 129(4), ­ 53–76. ­ ​­ Wilson, D., & Sperber, D. (1992). On verbal irony. Lingua, 87(1), Wimmer, H., & Perner, J. (1983). Beliefs about beliefs: Representation and constraining function of wrong ­ 103–128. ­ ​­ beliefs in young children’s understanding of deception. Cognition, 13(1), Winner, E., Engel, M., & Gardner, H. (1980). Misunderstanding metaphor: What’s the problem? Journal of ­ ­22–32. ​­ Experimental Child Psychology, 30(1), Winner, E., & Leekam, S.R. (1991). Distinguishing irony from deception: Understanding the speaker’s ­ ​­ second-order intention. British Journal of Developmental Psychology, 9, ­257–270. ​­ Zufferey, S. (2015). Acquiring Pragmatics: Social and Cognitive Perspectives. Routledge.

105

7 EXPERIMENTAL SOCIOLINGUISTICS Erez Levon

7.1

Introduction and definitions

Sociolinguistics is dedicated to understanding the relationship between language and the social world. The contemporary field of sociolinguistics coalesced in the 1960s, largely in reaction to the emergence of Chomsky’s generativist approach (Chomsky, 1965) and its insistence on the separation between competence (an individual’s knowledge of a language) and performance (the observable patterns of language use). For Chomsky (1965, p. 3), only competence was important to the development of linguistic theory, with actual performance seen as “grammatically irrelevant.” Scholars such as John Gumperz, Dell Hymes and William Labov took issue with Chomsky’s position, arguing that it offered an impoverished view of language that artificially separated analyses of a language’s grammatical “correctness” from the rules governing its everyday use (Hymes, 1966). Instead, they argued for the need to treat language as both a cognitive and a social phenomenon, where the object of study is not only what utterances count as “well-formed” but also how those utterances are used. ­well-formed ​­ Gumperz, Hymes, and Labov approached this task differently. For Gumperz, the focus was on how individuals use language strategically in interaction to achieve specific social and interactional goals (by, for example, switching to a different code when giving an order or making a request; see Gumperz, 1982). Hymes (1974) focused on language as part of a broader cultural ecology, considering the ways that language contributes to the more general social life of a community. Labov remained closest to Chomsky’s goal of developing a theory of language itself as a system, though with the understanding that doing so required two innovations. The first is that language is heterogeneous (Weinreich et al., 1968): competence does not entail homogeneity. Instead, there are “alternate ways of saying the ‘same thing’” (Labov, 1972). Second, this heterogeneity is ordered. Its appearance is not haphazard or “free”. Rather, variability in language is systematically distributed across social divisions, such that certain ways of speaking (“variants”) are associated with some social groups or situations, and other ways with others. For Labov, the goal of (socio)linguistics is, therefore, to map these associations between variable linguistic behaviour and social structure. To do so, Labov and colleagues (Labov, 1965; Weinreich et al., 1968) identified a number of key “problems” that a field of sociolinguistics needs to address. Two of these are relevant to the discussion in this chapter. The first is the embedding problem, or the need to determine the degree DOI: 10.4324/9781003392972-9 106

Experimental sociolinguistics

of correlation that exists between the use of a linguistic variant and a given set of social and/or situational factors (e.g., gender, ethnicity, social class, formality). The second is the evaluation problem, or the need to establish the subjective correlates (i.e., attitudinal judgements) of specific variants. Identifying a variant’s social embedding allows us to predict when a variant will (or will not) be used while knowing how it is evaluated gives us insight into why those usage patterns exist. From very early on, research in this tradition has relied on experimental methods to provide insight into embedding and evaluation, ultimately helping us to better understand the social meanings of language, or the “set of inferences that can be drawn on the basis of how language is used in a specific interaction” (Hall-Lew et al., 2021). In this chapter, I provide an overview of the main experimental approaches that have been adopted, summarizing both historical perspectives and current directions in the field. By the end of the chapter, readers will have a general understanding of the primary experimental paradigms that have been used in sociolinguistic research and an idea of how experimental methods can be used to address key questions of sociolinguistic theory.

7.2

Historical perspectives

In his “Some Principles of Linguistic Methodology”, Labov (1971) elaborates a methodological approach to the collection of language data for sociolinguistic enquiry. Much of this method involves the use of various speech elicitation techniques (e.g., sociolinguistic interviews, word lists, reading passages) designed to enable the collection of large samples of language production from numerous speakers across a variety of contexts that could subsequently be correlated with relevant social categories. Yet from the outset, Labov also promoted the importance of experimental methods as a way of addressing issues of sociolinguistic evaluation and embedding. In his well-known study of New York City speech, Labov (1966) asked participants to listen to a selection of audio recordings and to determine which profession they felt the speaker of the recording would be most suitable for (e.g., television personality, receptionist, factory worker). Listeners were unaware that they listened to sets of multiple recordings from the same individual that varied in the extent to which they contained certain relevant sociolinguistic variables (the presence of absence of post-vocalic /r/, for example, such as in the word car: [ka:] or [kaɹ]). Labov showed that listeners systematically judged speakers as being more suitable for employment in a prestigious profession (such as a television personality) when post-vocalic /r/ was present than when it was not. These results complemented the findings of Labov’s production studies (which showed, for example, that speakers in formal contexts and from higher social class backgrounds used more post-vocalic /r/) and provided further, confirmatory evidence that post-vocalic /r/ is associated with prestige in New York City speech. Labov’s use of subjective reaction tests was inspired by research at the time in social psychology on listeners’ subjective evaluations of language, and particularly Lambert and colleagues’ ­ ​­ (Lambert et al., 1960) pioneering matched-guise paradigm. Similar to Labov, Lambert and colleagues presented bilingual French-English listeners in Montreal, Canada, with a series of recordings, half in French and half in English, and asked them to judge the speakers on a variety of evaluative scales that measured the speakers’ status (e.g., intelligence, attractiveness) and social desirability (e.g., likeability, kindness). Listeners were unaware that they evaluated the same speakers twice, once in French and once in English. Lambert and colleagues found that listeners consistently rated the English guises more favourably, for both status and social desirability. By using “matched guises” for the experiment (i.e., the same speaker using two different codes), the authors were able to reliably conclude that the difference in ratings is based on an evaluation of

107

Erez Levon

the language itself (French versus English), rather than a broader judgement about a given speaker. Thus, in addition to providing valuable information about the ways that French and English are judged in Montreal society at the time, Lambert and colleagues developed a methodological blueprint for the controlled examination of the social meanings associated with language. While more than 60 years old, the blueprint developed by Lambert et al. (1960) and adapted by Labov (1966) is still very much in use today. There have, of course, been various modifications made in the intervening decades. While Lambert et al.’s (1960) matched-guise paradigm offered the benefit of full experimental control across stimuli, it is often difficult to locate speakers who can accurately produce the linguistic variation required (e.g., speak in both French and English or with or without post-vocalic /r/) and do so convincingly. The alternative verbal guise technique was thus developed (Cooper, 1975), in which different speakers produce different linguistic guises. While less controlled than matched-guise studies, the ability to more easily construct stimuli and their potential for higher external validity (i.e., the likelihood that the speakers will use an “authentic” version of the variety in question) have made verbal guise studies one of the predominant methods used today (Dragojevic & Goatley-Soan, 2022). For instance, Kristiansen (2009) describes a large-scale verbal guise study across five locations in Denmark, designed to elicit changing attitudes to local (regional) versus standard (Copenhagen) varieties of Danish. Similarly, Grondelaers and van Gent (2019) used the verbal guise paradigm to investigate attitudes to different regional and ethnic varieties of Dutch, including, in a move reminiscent of Labov’s (1966) experiments, whether particular varieties are perceived as more appropriate for certain professions than for others. Accent attitudes and employment is also the topic of Levon et al.’s (2021) analysis of potential bias in professional recruiting in England, where the authors also use a verbal guise study to evaluate listeners’ reactions to different regional and ethnic varieties of British English in a professional workplace context. While verbal guise studies were primarily designed to promote external experimental validity (i.e., to help guarantee the accuracy and “authenticity” of the stimuli evaluated), other adaptations of the matched-guise method have focused on internal validity, or the certainty with which specific linguistic parameters can be correlated with a given subjective reaction. In a traditional matched-guise paradigm, with the same speaker producing multiple stimuli in different codes (e.g., French and English), there is the possibility that the way the speaker produces each code varies in some way that presents a potential confound for analysis. It could be possible, for example, that a bilingual French/English speaker has a higher mean pitch level in French than in English, or speaks more quickly in French than in English. If we wish to isolate the effect of French versus English specifically, to the exclusion of these other potential complicating factors, additional controls are needed. With the advent of accessible audio resynthesis and manipulation software (such as Praat, see Boersma & Weenink, 2022), such experimental control became possible to incorporate. Campbell-Kibler (2007), for example, set out to examine attitudes to the ING variable in the United States (that is, the alternation between velar and alveolar realizations of ​­ the -ing suffix in English, in words such as walking or talking). To do so, Campbell-Kibler digitally manipulated recordings of various speakers, such that all that varied between the recordings was the specific variant of ING that was used. All other information in the speech signal – pitch, intonation, speech rate, the realization of all other phonological segments – was identical. This allowed Campbell-Kibler to more reliably correlate differences in attitude ratings across stimuli with specific variants of ING, hence increasing the internal validity of her arguments. Levon (2006, 2007) followed a similar procedure in an examination of attitudes toward pitch and sibilant duration in US English. Since then, this type of modified matched-guise approach has been used in a variety of settings to study multiple linguistic variables, including ING, /s/-fronting 108

Experimental sociolinguistics

(the acoustic realization of /s/ as higher or lower in frequency) and TH-fronting (labio-velar realization of the interdental fricatives in English, such as fink for think) in the UK (e.g., Levon 2014; Levon & Fox 2014; Levon & Buchstaller 2015), /s/-fronting and /t/-affrication in Denmark (Pharao et al., 2014; Pharao & Maegaard, 2017), and /t/ and /d/ affrication in Brazilian Portuguese (Freitag, 2020), among many others. Both the verbal-guise and the modified matched-guise paradigms were proposed to help enhance the validity (external and internal) of experimental results, and thus provide a more accurate picture of the social meanings of variation. Over the past 20 years, scholars have also argued that sociolinguistics need to go beyond a reliance on subjective evaluation testing alone, to address certain critical issues with these methods. The issues raised and the solutions that have been suggested are the topic of the following section.

7.3

Critical issues and topics

In the shift to focus on providing more accurate descriptions of the social meanings of variation, experimental research has concentrated on two main critical issues. The first is whether traditional methods, namely matched- and verbal-guise studies, allow us to reliably identify participants’ subjective evaluations. The issue is the extent to which such methods, by overtly asking respondents to judge speakers and speech samples, run into problems of social desirability bias (Nederhof, ­ 1985), or the tendency for experimental participants to respond in a way that is socially desirable or expected rather than responding in ways that reflect how they actually feel. There have been long debates in the literature about whether such bias is a problem for matched- and verbal-guise studies (see, e.g., Pantos, 2019; Pharao & Kristiansen, 2019) and a variety of new, more indirect methods for collecting subjective evaluations have been proposed to overcome any such bias that may exist. The most well known of these more indirect methods is the use of the Implicit Association Test (IAT; Greenwald et al., 1998), originally developed in social psychology. The IAT is a reaction time-based task that measures the speed with which respondents associate a given trait or attribute (“good”) with a given social category (“migrant”). By comparing reaction times across countervailing associations (e.g., the speech with which a respondent associates the category “migrant” with the attribute “good” versus with the attribute “bad”), IATs purport to provide a measure of the implicit link between categories and traits and, hence, a way to identify potential (biased) reactions that is resistant to social desirability effects. Campbell-Kibler (2012) was the first to demonstrate the utility of IATs for sociolinguistic investigations. In a first experiment, Campbell-Kibler used written stimuli to show that an identifiable implicit association exists between variants of the ING variable in US English (e.g., running versus runnin’) and geographical location (Northern versus Southern US states), social class differences (operationalized as traditionally working-class versus middle-class professions) and prestige (operationalized as a difference between country music singers and broadcast news presenters). Listeners were faster to associate alveolar realizations (e.g., ­ runnin’) with Southern states, working-class jobs, and country music singers. In a second experiment, Campbell-Kibler replicates this finding with audio stimuli, demonstrating that speech triggers similar associational patterns to written prompts. Finally, in a third experiment, Campbell-Kibler shows that the results of an IAT are significantly different than the results of direct questioning and a traditional matched-guise, thus illustrating the added value of IATs for sociolinguistics while also providing support for the idea that individuals process social information in both more automatic and more deliberative ways (i.e., Dual Process models; Brewer, 1988; see also Evans, 2008). 109

Erez Levon

Since Campbell-Kibler ’s proof of concept, IATs have been successfully used in sociolinguistic research to investigate attitudes to foreign-accented English in the US (Pantos & Perkins, 2013), the use of Welsh versus English in classrooms in Wales (Lee, 2015), Afrikaans-accented versus standard varieties of English in South Africa (Álvarez-Mosquera, 2017), Northern versus Southern accents of English in Britain (McKenzie & Carrie, 2018), and the social meanings of vowel variation in Swedish (Nilsson et al., 2019), among many others. Together, these studies have provided strong support for the existence of a distinct set of implicit attitudes and associations that differ from what are found with the more direct methods of attitude elicitation, such as matched-guise tasks. Nevertheless, questions have also arisen as to the actual types of associations IATs measure. While it is clear that IATs provide access to more automatic associations between social categories and trait attributes, some psychologists have questioned whether those associations are personal (i.e., associations that an individual actually has) or extra-personal (i.e., reflecting broader stereotypes in society) (see De Houwer et al., 2009). In other words, when we conduct an IAT, are we getting information about what respondents actually think or are we instead getting information about the ideological structures of society and the normative beliefs that circulate about social groups? In an attempt to address this issue, social psychologists developed a “personalized” version of the IAT, the P-IAT (Olson & Fazio, 2004). In a P-IAT, rather than associating a target with a general evaluation (e.g., “Parisian French” and “good”), associations with personalized evaluations are tested (e.g., “Parisian French” and “I like”). Rosseel et al. (2018) used a P-IAT task to examine attitudes toward three varieties of Dutch in Belgium: Standard Belgian Dutch (SBD), Antwerp Dutch (AD) and West Flanders Dutch (WFD). The P-IAT results demonstrate a persistent standard language ideology in operation in Belgium, with SBD preferred by all respondents (though the strength of that preference varies depending on whether respondents are themselves from Antwerp or West Flanders). This contrasts with findings from an explicit attitude measurement task, where there is evidence of pride in local dialects and the standard variety does not fare as well. Rosseel and colleagues take this as evidence that the P-IAT provides robust evidence of individuals’ personalised implicit attitudes to language, though they concede that the added benefit of a P-IAT over a traditional IAT has not yet been established within sociolinguistics. By focusing on reaction time differences, IATs (and P-IATs) enable a more indirect method of eliciting language attitudes and evaluations. Nevertheless, they rely on using explicit stimuli (e.g., named language varieties, named social categories) for establishing associations. The use of such labels detracts from the indirectness of the method, since respondents are aware that they are evaluating a particular target (Antwerp Dutch, for example). To avoid the inclusion of labels, and allow for an even more indirect approach, sociolinguists have also turned to various priming paradigms. In priming studies, listeners complete a particular experimental task after having been exposed to a social “trigger” (a prime), designed to make a particular social category or characteristic relevant without explicitly mentioning it. For example, Strand and Johnson (1996) sought to investigate the gendered associations of alveolar fricatives in American English, and specifically whether women are stereotypically perceived as having fronter (i.e., acoustically higher frequency) fricative realizations than men. To do so, they designed a lexical decision task in which respondents heard an audio stimulus and had to decide whether what they heard was the English word sod (with an initial alveolar fricative) or shod (with an initial postalveolar fricative). The initial segment of the stimuli was acoustically manipulated along a 9-point continuum, ranging from a canonical [s] through various ambiguous realizations to a canonical [ʃ]. While completing the lexical decision task, respondents were simultaneously presented with a photo of either a woman or a man. The photo acted as the prime, activating gender as a relevant social category without explicitly mentioning it in relation to the task. Strand and Johnson found that the visual primes for gender had a significant 110

Experimental sociolinguistics

effect on which word the respondents reported hearing. When shown a woman’s face, respondents were more likely to classify ambiguous tokens as shod than as sod. When shown a man’s face, in contrast, respondents were more likely to classify ambiguous tokens as sod. Strand and Johnson argue that the reason for this is because there is a stereotypical association between women and fronter realizations of alveolar fricatives in English, such that an ambiguous stimulus is heard as a fronted token of shod when respondents are primed for the category “woman” while it is heard as a token of sod when primed for the category “man” (since we do not assume that fricative fronting is something than men do). Through their priming study, Strand and Johnson were thus able to use a highly indirect method to identify a stereotypical link between fricative fronting and gender in US English (see also Johnson et al., 1999; Munson et al., 2006). Niedzielski (1999) demonstrates how written primes can also affect respondent judgements. In Niedzielski’s study, respondents in Detroit, Michigan (a city in the United States not far from the Canadian border), were played a recording of a woman reading a series of sentences and asked to focus on how the women produced the Mouth diphthong (i.e., the vowel in a word like mouth or house). This vowel is interesting because in Canada it is stereotypically raised, yielding pronunciations that sound more like mooth or hoose, though the women in the recording (who was from Detroit) did not have raised realizations in her speech. After hearing the recording, respondents were then presented with a series of resynthesized tokens of Mouth vowels ranging from canonical realizations to both raised and lowered tokens, after which they were asked to indicate on an answer sheet which of the resynthesized vowels was most like the vowels used in the recording of the women they heard previously. On half of the answer sheets, the word “MICHIGAN” was printed, while on the other half the word “CANADA” was printed. Despite all respondents hearing the same recording, those who had the word “CANADA” on their answer sheets reported that the woman’s vowels sounded more raised than those who had “MICHIGAN” on their answer sheets. In other words, even though there was no actual linguistic difference in the stimuli, priming for location (Canada versus Michigan) was enough to change respondents’ judgements, thus providing strong evidence for a stereotypical link between nationality and speech. Both visual and written primes have since been used to explore a variety of linguistic stereotypes, including those related to nation (Hay et al., 2006a; Hay & Drager, 2010; Walker et al., 2019), social class (Hay et al., 2006b) and personality type (D’Onofrio, 2018), among others. Priming studies provide one way of investigating individuals’ more automated reactions and evaluations. Recently, sociolinguistic research has also turned to brain imaging studies to address similar issues. Foucart and Hartsuiker (2021), for example, describe a study using Event Related Potentials (ERP) to examine listeners’ social evaluations and processing of native- versus foreignaccented speech in Dutch. Based on prior research demonstrating that listeners judge foreignaccented speakers to be less “credible” (e.g., Lev-Ari & Keysar, 2010), Foucart and Hartsuiker sought to determine whether foreign-accented speech triggered similar neural responses to other forms of semantic-processing difficulty. Adopting a truth evaluation paradigm, Foucart and Hartsuiker presented native Dutch-speaking Belgian listeners with sentences containing true and known information (“The waffle was first invented in Belgium”), true but unknown information (“The saxophone was first invented in Belgium”) and false information (“The waffle was first invented in Mexico”). The sentences were spoken either by a native speaker of Dutch or a highly proficient non-native speaker. ERP analyses demonstrated that listeners had more difficulty in parsing low-level vocal information when the speaker was non-native. They also demonstrated a “shallower” form of semantic processing of non-native speech, one in which distinctions between known and unknown information was important (this is in contrast to the native speaker, where large differences between known versus unknown information were observed). Foucart and 111

Erez Levon

Hartsuiker argue that these results support the claim that accent differences correlate with sharp processing discrepancies (Lev-Ari et al., 2018), discrepancies which are themselves grounded in negative evaluations of certain accents. In other words, research such as Foucart and Hartsuiker’s study demonstrates that social ideologies about language and language users are so entrenched (i.e., ­ embedded) that they can affect the most basic neurolinguistic processing.

7.4

Language in context

Adopting new methods to investigate language attitudes and social meaning more indirectly has been the primary development in experimental sociolinguistics over the past 20 years. More recently, work has also begun to focus on ways in which the contextual embeddedness of language can be explored experimentally. Even though we normally encounter language in a rich social context (i.e., at a particular time, interacting with a particular individual, in a particular place), experimental approaches have traditionally abstracted away from context and/or have attempted to provide “neutral” situations in which to evaluate language and language use. Yet, theories in social psychology have increasingly come to show that it is impossible to separate our reactions to language (or any attitude object) from the specific context in which it occurs (e.g., Giles & Marlow, 2011). Approaches have thus been developed that attempt to bring contextual elements into experimental paradigms. One of the first ways in which this was done was by examining reactions to multiple linguistic phenomena simultaneously, as a way of approximating the rich linguistic input that individuals encounter in the world. Pharao and colleagues (2014), for example, used a modified matched-guise paradigm to examine listeners’ reactions to /s/-fronting in two Danish guises. The first is what they call a “modern” guise and refers to a man speaking with a (White) urban Copenhagen accent. The second is what they call the “street” guise, a perceptually salient variety of Danish that is associated with Copenhageners with a migrant background and with a so-called gangster lifestyle. Pharao and colleagues found that /s/-fronting acts as a salient cue of gayness in the modern guise but shows no such association in the street guise. In other words, listeners appear unwilling to label a “street” speaker as gay-sounding even when /s/-fronting is present. Pharao and colleagues interpret this finding as indicating that /s/-fronting somehow loses its indexical link to gayness in the context of the “street” accent. The implication of this is that the “street” accent serves to trigger the impression of a particular kind of person in the minds of listeners, a person who in their view is not “gay” (see also Campbell-Kibler, 2011; Levon, 2014 for similar results in the US and UK, respectively). For Pharao and colleagues, the individual identity of the speaker (“modern” or “street”) provides the context within which subjective evaluations are elaborated. Hilton and Jeong (2019), in contrast, investigate how information about a specific interaction affects listener responses. In their work, Hilton and Jeong also use a modified matched-guise paradigm to explore how contextual enrichment influences listeners’ subjective reactions to three well-studied variables of American English: number agreement in existential there constructions (e.g., there is versus there are), final rising intonation on declaratives (i.e., uptalk) and overlapping speech. When presented with stimuli in a “no context” condition, listeners in Hilton and Jeong’s study evaluate variants in line with dominant stereotypes of the features: e.g., there is + plural NPs are judged as sounding less “educated” and rising tunes on declaratives are perceived as more “polite”. These evaluations change, however, as further contextual information is provided. When situated within a more antagonistic speech context, for example, rising tunes on declaratives are judged to sound less “polite” and more “combative”. Likewise, overlapping speech is interpreted as less “interruptive” when it coincides with topic alignment among speakers as opposed to topic shift. Based on their 112

Experimental sociolinguistics

results, Hilton and Jeong (2019) argue that common and/or stereotypical meanings of sociolinguistic variables can be obscured by contextual enrichment, rendering features indexically inoperative in certain contexts (see also Campbell-Kibler, 2009, on so-called bullet-proofing; and Dragojevic & Giles, 2014, on reference frames). Levon and Ye (2020) come to a similar conclusion in their matched-guise experiment examining how uptalk (final rising tunes in declarative clauses) in Standard Southern British English is perceived in different (simulated) courtroom settings. They show that in the context of a (mock) medical malpractice trial, expert witnesses who use uptalk are judged as sounding less “confident” regardless of their gender – a finding that is broadly in line with popular sociolinguistic stereotypes. In the context of a (mock) rape trial, in contrast, uptalk had a very different effect. When presented with audio testimony from a man (i.e., the defendant in the rape trial), listeners were shown to perceive him as sounding more “trustworthy” and more “likeable” when his speech contained uptalk. When presented with audio testimony from a woman (i.e., the complainant in the trial), uptalk had no effect whatsoever on listeners’ perceptual evaluations. Levon and Ye (2020) argue that this difference in evaluations of uptalk across contexts is caused by the strong gender ideologies that operate in the context of a rape trial, ideologies that interact with and constrain the indexical potential of a feature like uptalk (see also Hildebrand-Edgar & Ehrlich, 2017). Findings such as these are important because they demonstrate that meaningful associations between language and social meanings or evaluations do not exist in a vacuum. Rather, such meanings emerge in situated contexts of use via the selective activation of one of a range of possible meaning potentials with which a linguistic feature is associated (Eckert, 2012, 2016). This is why, for example, uptalk in English can mean one thing in a medical malpractice trial (e.g., decreased confidence) and something entirely different in a rape trial (e.g., increased trustworthiness and likeability). Experimental work on the role of context in sociolinguistic perception is thus providing a much-needed corrective to earlier, more context-“neutral” approaches.

7.5

Current contributions and research

As evident from the preceding review, experimental sociolinguistics research has provided a wealth of information about the types of evaluations and reactions that language variation can produce. But what is still largely missing is an understanding of the cognitive mechanisms that underpin such results and the process through which evaluative reactions are formed (though see Niedzielski & Preston, 2003). To address this issue, several recent studies have devised new methods to study sociolinguistic perception in real time. Watson and Clark (2013), for example, examined listeners’ evaluative reactions to speakers from Liverpool and St Helens, two locations in Northwest England whose characteristic varieties share a phonetic merger in the nurse and square lexical sets. Watson and Clark used a bespoke real-time reaction tool (in the form of a graphical sliding scale that listeners could move while listening to stimuli) to determine the extent to which judgements of the perceived status of a speaker were affected by individual occurrences of merged tokens of nurse and square at the moment of encountering them. They report that listeners’ sensitivity to the nurse~square merger was dynamic, constrained not only by general properties of the variants (i.e., whether they coincide with standard pronunciations of nurse and square in British English) but also by the unfolding linguistic context (whether the non-standard pronunciation of square was encountered before or after a standard token of nurse). Levon et al. (2021) addressed a similar set of questions in their exploration of real-time perceptions of speakers from Newcastle, in the Northeast of England. Using the same real-time reaction 113

Erez Levon

tool as Watson and Clark (2013), they examined how the cumulative distribution of both phonetic and morphosyntactic features characteristic of the Newcastle variety affects the perceived status of a speaker. Levon and colleagues demonstrate that whether listeners react to a dialect variant when encountering it depends on the other linguistic features they have been exposed to earlier in the speech signal. While phonetic and morphosyntactic features behave somewhat differently in this regard (with listeners reacting to morphosyntactic features after encountering only one token, whereas they require multiple tokens of phonetic features before registering an evaluative response; cf. Labov et al., 2011), the results of Levon et al. (2021) provide further support for the idea that sociolinguistic impression formation is dynamic and evolves over the course of an utterance (see also Hesson & Shellgren, 2015). Montgomery and Moore (2018) make a similar claim, though they approach their investigation of the topic differently. Using a new bespoke reaction tool, Montgomery and Moore (2018) asked respondents to listen to multiple examples of a speaker from the Scilly Islands, off the coast of Cornwall in the Southwest of England, and to indicate while listening (by clicking on a button on the screen) any time they encountered a feature that gave them a clue as to where the speaker was from. Montgomery and Moore found that the linguistic signals that listeners relied upon to “place” the regional origin of the speaker were dependent on both the content of the extract and the other phonetic variants that were co-present in the speech signal. Based on these results, Montgomery and Moore (2018) argue that work on sociolinguistic perception needs to go beyond its predominant focus on whether specific features come to evoke categories (or not) in particular social and/or linguistic contexts. Instead, they encourage work that ‘model[s] the effect of copresent variants on [the activation of] a given exemplar, not just the social indices of the specific exemplar itself’ (Montgomery & Moore, 2018, p. 655; see also Campbell-Kibler, 2016). In other words, Montgomery and Moore argue that we must broaden our focus from examining specific linguistic features to look at how language cues more holistic impressions of specific person-types (e.g., “farmer” or “city-dweller”) and how those early impressions then affect how specific variable forms are evaluated (see also Podesva et al., 2015). Teasing apart the process of holistic linguistic impression formation is the topic of Levon, Sharma and Ye (2022). In that study, the authors used a real-time graphical slider response tool (similar to Levon, Buchstaller, et al., 2021; Watson & Clark, 2013) to examine the time course of listener judgements of the perceived professional competence of candidates for a position in a British law firm. They found that candidates who spoke with a standard accent of British English (i.e., Received Pronunciation) were immediately judged favourably, with high ratings appearing only moments after the stimulus began and remaining favourable throughout. For candidates who spoke with a non-standard accent (Multicultural London English), in contrast, ratings remained low throughout much of the stimulus and only began to rise towards the end when listeners had heard that the speaker answered the interview question well. Based on detailed statistical analyses of the real-time response trajectories, Levon and colleagues argue that these results support a dynamic competition model of impression formation. In such models (see, for example, Freeman & Ambady, 2011; Kleinschmidt et al., 2018), social cues, such as language, do not activate a single cognitive representation. Rather, they activate multiple candidate representations, which then compete with one another over the time course of evaluation until one of the representations eventually wins out. In the context of Levon et al.’s (2022) study, this means that when listeners first hear a candidate for a job at a law firm, they immediately activate two possible evaluations, both favourable and unfavourable. Then, while listening to the candidate, they rely on linguistic cues to dynamically select one evaluation over the other. In Levon and colleagues’ study, while both candidates ended up being rated favourably, the speaker with the non-standard accent had 114

Experimental sociolinguistics

to work harder to achieve this result, whereas the speaker with the standard accent was rated favourably very quickly. This finding is important because it reveals a new form of implicit bias, in which speakers of non-standard varieties have a greater burden of proof than speakers of standard varieties and demonstrates that judgements are not something that only appear after the fact, but instead develop in real time.

7.6

Recommendations for practice

The studies reviewed in this chapter all present different ways in which sociolinguists have used experimental methods to examine the embedding of language in social life. The various methodological innovations that have been described, including the use of progressively more indirect methods and the introduction of contextual elements, are intended to provide greater validity to experimental findings. Yet, it is important to recall that the social meaning of language is an incredibly complex and contingent phenomenon, one that sits somewhat uneasily with the requirements for control and direct comparison of experimental approaches. This is not to say that experiments are not useful. Quite the opposite. But it is crucial to bear in mind that experiments only ever provide a partial understanding of a given social phenomenon, one that is by necessity abstracted from the “messiness” of everyday life. For this reason, when deciding to use experiments to address sociolinguistic questions, we must always have a very clear idea of the specific phenomenon we are interested in. Are we aiming to solicit evaluations of a specific linguistic feature, of a language variety more broadly, or of the social group with which a linguistic style is associated? To what extent could social desirability bias be a problem, and so are more indirect methods necessary? Is context likely to play a significant role in respondent evaluations, and so is this something we need to manipulate (or control)? While these are not specific practical recommendations for how to choose, design and execute sociolinguistic experiments (for that, see Drager, 2018), these questions should help you to narrow down the specific focus of a study and so determine the most appropriate experimental paradigm for addressing your research question.

7.7

Future directions

Over the course of the past 50 years, the dividing line between experimental research on specifically sociolinguistic questions and issues that are more broadly relevant to psychology and cognitive science has blurred. While sociolinguists have long drawn on social psychology for inspiration, scholars have increasingly engaged with psychological theories of person perception, impression formation and (social) cognition. Ultimately, the field is moving away from seeing language and linguistic perception as isolated phenomena and instead exploring the ways in which sociolinguistic processing draws on the same fundamental mechanisms as all forms of social cognition (Campbell-Kibler, 2016; Chevrot et al., 2018). In doing so, the field is returning to its origins in Labov’s (and others’) early assertion that language cannot be separated from the social world in which it occurs, and that developing an adequate theory of language requires using the full methodological toolkit at our disposal, including experimental approaches.

Further reading Chevrot, J.-P., Drager, K., & Foulkes, P. (2018). Editors’ introduction and review: Sociolinguistic variation ­ ­679–695. ​­ and cognitive science. Topics in Cognitive Science 10(4),

115

Erez Levon Drager, K., 2014. Experimental research methods in sociolinguistics. Bloomsbury. Kircher, R., & Zipp, L. (Eds.). (2022). Research methods in language attitudes. Cambridge University Press. Loudermilk, B. (2013). Psycholinguistic approaches. In Bayley, R., Cameron, R. & Lucas, C. (Eds.), The ­­  ­132–152). ​­ Oxford handbook of sociolinguistics (pp. Oxford University Press.

Related topics Experimental pragmatics, analysing speech perception, controlling social factors in experimental linguistics, experimental methods to study cultural differences in linguistics

References Álvarez-Mosquera, P. (2017). The use of the implicit association test (iat) for sociolinguistic purposes in ­ 69–90. ­ ​­ https://doi.org/10.1080/10228195.2017.1331458 ­ ­ ­ South Africa. Language Matters, 48(2), Boersma, P., & Weenink, D. (2022). Praat: Doing phonetics by Computer (6.2.17). ­ http://www.praat.org/ ­ Brewer, M. (1988). A dual process model of impression formation. In T. Skrull & R. Wyer (Eds.), A dual ­­  ­1–36). ​­ process model of impression formation (pp. Lawrence Erlbaum Associates. Campbell-Kibler, K. (2007). Accent, (ing), and the social logic of listener perceptions. American Speech, ­ 32–64. ­ ​­ https://doi.org/10.1215/00031283-2007-002 ­ ­ ­­ ­​­­ ​­ 82(1), Campbell-Kibler, K. (2009). The nature of sociolinguistic perception. Language Variation and Change, ­ ­ ​­ ­ ­ ­ 21(01), 135–156. https://doi.org/10.1017/S0954394509000052 Campbell-Kibler, K. (2011). Intersecting variables and perceived sexual orientation in men. American Speech, ­ 52–68. ­ ​­ 86(1), Campbell-Kibler, K. (2012). The implicit association test and sociolinguistic meaning. Lingua, 122(7), ­ 753– ­ ​ ­ ­ ­ ­ 763. https://doi.org/10.1016/j.lingua.2012.01.002 Campbell-Kibler, K. (2016). Towards a cognitively realistic model of meaningful sociolinguistic variation. In ­­  ­123–151). ​­ A. Babel (Ed.), Awareness and control in sociolinguistic research (pp. Cambridge University Press. Chevrot, J.-P., Drager, K., & Foulkes, P. (2018). Editors’ introduction and review: Sociolinguistic variation and cognitive science. Topics in Cognitive Science, 10(4), ­ ­679–695. ​­ https://doi.org/10.1111/tops.12384 ­ ­ ­ Chomsky, N. (1965). Aspects of the theory of syntax. MIT Press. Cooper, R. (1975). Introduction to language attitudes. International Journal of the Sociology of Language, ​­ 6, ­5–9. De Houwer, J., Teige-Mocigemba, S., Spruyt, A., & Moors, A. (2009). Implicit measures: A normative analy­ ­347–368. ​­ ­ ­ ­ sis and review. Psychological Bulletin, 135(3), https://doi.org/10.1037/a0014211 D’Onofrio, A. (2018). Personae and phonetic detail in sociolinguistic signs. Language in Society, 47(4), ­ ­513–​ ­539. https://doi.org/10.1017/S0047404518000581 ­ ­ ­ Drager, K. (2018). Experimental research methods in sociolinguistics. Bloomsbury. Dragojevic, M., & Giles, H. (2014). The reference frame effect: An intergroup perspective on language attitudes: Reference frame effect. Human Communication Research, 40(1), ­ ­91–111. ​­ https://doi. ­ org/10.1111/hcre.12017 ­ ­ Dragojevic, M., & Goatley-Soan, S. (2022). The verbal-guise technique. In R. Kircher & L. Zipp (Eds.), Research methods in language attitudes (pp. ­­  ­203–218). ​­ Cambridge University Press. Eckert, P. (2012).Three waves of variation study:The emergence of meaning in the study of sociolinguistic variation. Annual Review of Anthropology, 41(1), ­ ­87–100. ​­ https://doi.org/10.1146/annurev-anthro-092611-145828 ­ ­ ­­ ­​­­ ­​­­ ​­ Eckert, P. (2016). Variation, meaning and social change. In N. Coupland (Ed.), Sociolinguistics: Theoretical debates (pp. ­­  ­68–85). ​­ Cambridge University Press. Evans, J. St. B. T. (2008). Dual-processing accounts of reasoning, judgment, and social cognition. Annual Review of Psychology, 59(1), ­ ­255–278. ​­ https://doi.org/10.1146/annurev.psych.59.103006.093629 ­ ­ ­ Foucart, A., & Hartsuiker, R. (2021). Are foreign-accented speakers that ‘incredible’? The impact of the speaker’s indexical properties on sentence processing. Neuropsychologia, 158, 107902. https://doi.org/ 10.1016/j.neuropsychologia.2021.107902 ­

116

Experimental sociolinguistics Freeman, J. B., & Ambady, N. (2011). A dynamic interactive theory of person construal. Psychological Review, 118(2), https://doi.org/10.1037/a0022327 ­ 247–279. ­ ​­ ­ ­ ­ Freitag, L. (2020). Effects of the linguistics processing: Palatals in Brazilian Portuguese and the sociolinguistic monitor. University of Pennsylvania Working Papers in Linguistics, 25(2), ­ Article 4, 21–30. ­ ​­ Giles, H., & Marlow, M. (2011). Theorizing language attitudes existing frameworks, an integrative model, and new directions. Annals of the International Communication Association, 35(1), https://doi. ­ 161–197. ­ ​­ ­ org/10.1080/23808985.2011.11679116 ­ ­ Greenwald, A. G., McGhee, D. E., & Schwartz, J. L. (1998). Measuring individual differences in implicit cognition: The implicit association test. Journal of Personality and Social Psychology, 74(6), ­ 1464–1480. ­ ​­ Grondelaers, S., & van Gent, P. (2019). How “deep” is Dynamism? Revisiting the evaluation of Moroccanflavored Netherlandic Dutch. Linguistics Vanguard, 5(s1). https://doi.org/10.1515/lingvan-2018-0011 ­ ­ ­ ­­ ­​­­ ​­ Gumperz, J. (1982). Discourse strategies. Cambridge University Press. Hall-Lew, L., Moore, E., & Podesva, R. (2021). Social meaning and linguistic variation: Theoretical foundations. In L. Hall-Lew, E. Moore & R. Podesva (Eds.), Social meaning and linguistic variation: theorizing the third wave (pp. ­­  ­1–24). ​­ Cambridge University Press. Hay, J., & Drager, K. (2010). Stuffed toys and speech perception. Linguistics, 48(4). ­ https://doi.org/10.1515/ ­ ­ ­ ling.2010.027 Hay, J., Nolan, A., & Drager, K. (2006a). From fush to feesh: Exemplar priming in speech perception. The ­ ­351–379. ​­ Linguistic Review, 23(3), Hay, J., Warren, P., & Drager, K. (2006b). Factors influencing speech perception in the context of a merger in ­ 458–484. ­ ​­ ­ ­ ­ progress. Journal of Phonetics, 34(4), https://doi.org/10.1016/j.wocn.2005.10.001 Hesson, A., & Shellgren, M. (2015). Discourse marker like in real time: Characterizing the time-course of ­ 154. http://dx.doi.org/10.1215/ ­ ­ sociolinguistic impression formation. American Speech; Durham, 90(2), ­­00031283-3130313 ​­ Hildebrand-Edgar, N., & Ehrlich, S. (2017). “She was quite capable of asserting herself”: Powerful speech styles and assessments of credibility in a sexual assault trial. Language and Law, 4(2), ­ ­89–107. ​­ Hilton, K., & Jeong, S. (2019). The role of context in sociolinguistic perception. Linguistics Vanguard, 5(s1). ­ https://doi.org/10.1515/lingvan-2018-0069 ­ ­ ­­ ­​­­ ​­ Hymes, D. (1966). Two types of linguistic relativity. In W. Bright (Ed.), Sociolinguistics (pp. ­­  ­114–158). ​­ Mouton. Hymes, D. (1974). Foundations in sociolinguistics: An ethnographic approach. University of Pennsylvania Press. Johnson, K., Strand, E., & D’Imperio, M. (1999). Auditory-visual integration of talker gender in vowel perception. Journal of Phonetics, 27(4), ­ 359–384. ­ ​­ https://doi.org/10.1006/jpho.1999.0100 ­ ­ ­ Kleinschmidt, D. F., Weatherholtz, K., & Jaeger, T. F. (2018). Sociolinguistic perception as inference under uncertainty. Topics in Cognitive Science, 10(4), ­ 818–834. ­ ​­ Kristiansen, T. (2009). The macro-level social meanings of late-modern Danish accents. Acta Linguistica Hafniensia, 41(1), https://doi.org/10.1080/03740460903364219 ­ 167–192. ­ ​­ ­ ­ ­ Labov, W. (1965). On the mechanism of linguistic change. Georgetown University Monographs on Languages and Linguistics, 18, 91–114. ­ ​­ Labov, W. (1966). ­ The social stratification of English in New York City. Center for Applied Linguistics. Labov, W. (1971). Some principles of linguistic methodology. Language in Society, 1, 97–120. ­ ​­ Labov, W. (1972). ­ Sociolinguistic patterns. University of Pennsylvania Press. Labov, W., Ash, S., Ravindranath, M., Weldon, T., & Nagy, N. (2011). Properties of the sociolinguistic monitor. Journal of SocioLinguistics, 15(4), ­ 431–463. ­ ​­ Lambert, W. E., Hodgson, R. C., Gardner, R. C., & Fillenbaum, S. (1960). Evaluational reactions to spoken languages. Journal of Abnormal and Social Psychology, 60(1), ­ 44–51. ­ ​­ Lee, R. (2015). Implicit associations with Welsh in two educational contexts. York Papers in Linguistics, 2(14), ­ 81–105. ­ ​­ Lev-Ari, S., & Keysar, B. (2010). Why don’t we believe non-native speakers? The influence of accent on ­ ­ ​­ ­ ­ ­ credibility. Journal of Experimental Social Psychology, 46(6), 1093–1096. https://doi.org/10.1016/j. jesp.2010.05.025 Lev-Ari, S., Ho, E., & Keysar, B. (2018). The unforeseen consequences of interacting with non-native speakers. Topics in Cognitive Science, 10(4), ­ 835–849. ­ ​­ https://doi.org/10.1111/tops.12325 ­ ­ ­

117

Erez Levon Levon, E. (2006). Hearing ‘gay’: Prosody, interpretation and the affective judgments of men’s speech. American Speech, 81(1), https://doi.org/10.1215/00031283-2006-003 ­ ­56–78. ​­ ­ ­ ­­ ­​­­ ​­ Levon, E. (2007). Sexuality in context: Variation and the sociolinguistic. Language in Society, 36(4), ­ ­533–​ ­554. https://doi.org/10.1017/S0047404507070431 ­ ­ ­ Levon, E. (2014). Categories, stereotypes and the linguistic perception of sexuality. Language in Society, 43(5), ­ ­539–566. ​­ Levon, E., & Buchstaller, I. (2015). Perception, cognition and linguistic structure: The effect of linguistic modularity and cognitive style on sociolinguistic processing. Language Variation and Change, 27(3), ­ ­319–348. ​­ Levon, E., Buchstaller, I., & Mearns, A. (2021). Towards an integrated model of perception: Linguistic architecture and the dynamics of sociolinguistic cognition. In K. Beaman, I. Buchstaller, S. Fox & J. Walker ­­  ­32–54). ​­ (Eds.), Advancing socio-grammatical variation and change: In honour of Jenny Cheshire (pp. ­ Routledge. Levon, E., & Fox, S. (2014). Social salience and the sociolinguistic monitor: A case study of ING and TH-fronting ­ 185–217. ­ ​­ ­ ­ ­ in Britain. Journal of English Linguistics, 42(3), https://doi.org/10.1177/0075424214531487 Levon, E., Sharma, D., Watt, D. J. L., Cardoso, A., & Ye, Y. (2021). Accent bias and perceptions of profes­ 355–388. ­ ​­ ­ ­ sional competence in England. Journal of English Linguistics, 49(4), https://doi.org/10.1177/ ­ 00754242211046316 Levon, E., Sharma, D., & Ye, Y. (2022). Dynamic sociolinguistic processing: Real-time changes in judgments ­ 749–774. ­ ​­ of speaker competence. Language, 98(4), Levon, E., & Ye, Y. (2020). Language, indexicality and gender ideologies: Contextual effects on the perceived ­ 123–151. ­ ​­ ­ ­ ­ credibility of women. Gender and Language, 14(2), https://doi.org/10.1558/genl.39235 McKenzie, R., & Carrie, E. (2018). Implicit-explicit attitudinal discrepancy and the investigation of language ­ 830–844. ­ ​­ attitude change in progress. Journal of Multilingual and Multicultural Development, 39(9), ­ ­ ­ https://doi.org/10.1080/01434632.2018.1445744 Montgomery, C., & Moore, E. (2018). Evaluating S(c)illy voices: The effects of salience, stereotypes, and co­ 629–661. ­ ​­ present language variables on real-time reactions to regional speech. Language, 94(3), https://­ ­ ­ doi.org/10.1353/lan.2018.0038 Munson, B., Jefferson, S., & McDonald, E. (2006). The influence of perceived sexual orientation on fricative ­ 2427–2437. ­ ​­ ­ ­ identification. Journal of the Acoustical Society of America, 119(4), https://doi.org/10.1121/ ­ 1.2173521 Nederhof, A. (1985). Methods of coping with social desirability bias: A review. European Journal of Social ­ 263–280. ­ ​­ ­ ­ ­ Psychology, 15(3), https://doi.org/10.1002/ejsp.2420150303 Niedzielski, N. (1999). The effect of social information on the perception of sociolinguistic variables. Journal ­ ​­ of Language and Social Psychology, 18, 62–85. Niedzielski, N., & Preston, D. (2003). Folk linguistics. Mouton de Gruyter. Nilsson, J., Leinonen, T., & Wenner, L. (2019). Tracking change in social meaning: The indexicality of ­ ­ in rural Sweden. In J.-A. ​­ Villena-Ponsoda, ­ ​­ ​­ ­Ávila-Muñoz ​­   & M. ‘damped’ /i/ F. Díaz Montesinos, A.-M. ­Vida-Castro ​­ ­ (Eds.), Language Variation—European Perspectives VII: Selected Papers from the Ninth ­­  ­145–158). ​­ International Conference on Language Variation in Europe (ICLaVE 9), Malaga (pp. John ­ ­ ­ Benjamins. https://doi.org/10.1075/silv.22.09nil Olson, M., & Fazio, R. (2004). Reducing the influence of extrapersonal associations on the Implicit Associa­ ­653–667. ​­ tion Test: Personalizing the IAT. Journal of Personality and Social Psychology, 86(5), https://­ ­ ­­ ​­ doi.org/10.1037/0022-3514.86.5.653 Pantos, A. (2019). Implicitness, automaticity, and consciousness in language attitudes research: Are they related ­ ­ ­ ­­ ­​­­ ​­ and how do we characterize them? Linguistics Vanguard, 5(s1). https://doi.org/10.1515/lingvan-2018-0007 Pantos, A., & Perkins, A. (2013). Measuring implicit and explicit attitudes toward foreign accented speech. ­ ­3–20. ​­ ­ ­ ­ Journal of Language and Social Psychology, 32(1), https://doi.org/10.1177/0261927X12463005 Pharao, N., & Kristiansen, T. (2019). Reflections on the relation between direct/indirect methods and explicit/ ­ ­ ­ ­­ ­​­­ ​­ implicit attitudes. Linguistics Vanguard, 5(s1). https://doi.org/10.1515/lingvan-2018-0010 Pharao, N., & Maegaard, M. (2017). On the influence of coronal sibilants and stops on the perception of social ­ ­1141–1167. ​­ ­ ­ ­­ ­​­­ ​­ meanings in Copenhagen Danish. Linguistics, 55(5), https://doi.org/10.1515/ling-2017-0023 Pharao, N., Maegaard, M., Møller, J., & Kristiansen, T. (2014). Indexical meanings of [s+] among Copenhagen ­ 1–31. ­ ​­ youth: Social perception of a phonetic variant in different prosodic contexts. Langauge in Society, 43(1),

118

Experimental sociolinguistics Podesva, R., Reynolds, J., Callier, P., & Baptiste, J. (2015). Constraints on the social meaning of released /t/: A production and perception study of U.S. politicians. Language Variation and Change, 27(1), ­ 59–87. ­ ​­ https://doi.org/10.1017/S0954394514000192 ­ ­ ­ Rosseel, L., Speelman, D., & Geeraerts, D. (2018). Measuring language attitudes using the personalized implicit association test: A case study on regional varieties of Dutch in Belgium. Journal of Linguistic Geography, 6(1), https://doi.org/10.1017/jlg.2018.3 ­ 20–39. ­ ​­ ­ ­ ­ Strand, E., & Johnson, K. (1996). Gradient and visual speaker normalization in the perception of fricatives. In Natural Language Processing and Speech Technology: Results of the 3rd KONVENS Conference (pp. ­­  ­14–26). ​­ Mouton de Gruyter. Walker, M., Szakay, A., & Cox, F. (2019). Can kiwis and koalas as cultural primes induce perceptual bias in ­ 7. https://doi.org/10.5334/labphon.90 ­ ­ ­ Australian English speaking listeners? Laboratory Phonology, 10(1), Watson, K., & Clark, L. (2013). How salient is the NURSE~SQUARE merger? English Language and Linguistics, 17(2), ­ ­297–323. ​­ https://doi.org/10.1017/S136067431300004X ­ ­ ­ Weinreich, U., Labov, W., & Herzog, M. (1968). Empirical foundations for a theory of language change. In W. Lehmann & Y. Malkiel (Eds.), Directions for historical linguistics (pp. ­­  ­95–188). ​­ University of Texas Press.

119

8 EXPERIMENTAL STUDIES IN DISCOURSE Ted J. M. Sanders, Jet Hoek and Merel C. J. Scholman

8.1

Introduction

People communicate through discourse. We talk to each other, we read the newspaper or interact on social media, and in all these contexts we use discourse; we produce and process messages that consist of more than one clause. The importance of the discourse level for the study of language and linguistics can hardly be overestimated: “Discourse is what makes us human” (Graesser et al., 1997 p. 164). In this chapter, we provide an overview of some classical and recent experimental studies of discourse and their development in terms of research focus and methodological advancements. The focus will be on the fit of methods with research questions. In the field of discourse studies, the dominant view is that the connectedness of discourse is a characteristic of the cognitive representation of the discourse rather than of the discourse itself (Sanders & Pander Maat, 2006; and many others). The term coherence is used for this type of connectedness. Language users establish coherence by actively relating the different information units in the text. Generally speaking, there are two respects in which discourse can cohere: • Referential coherence: smaller linguistic units (often nominal groups) may relate to the same mental referent. • Relational coherence: discourse segments (most often conceived of as clauses) are connected by coherence relations like Cause-ConsequenCe between them. These two types are illustrated in example (1). 1 James fired the guy from the accounting department. He was embezzling money. This example contains an ambiguous pronoun (“he” could refer to James or the guy from accounting) and there is no explicit connective expressing the coherence relation. However, readers likely construct a coherent representation of this simple discourse by interpreting that “he” refers to the guy from the accounting department (referential) and that the reason for being fired is that he was embezzling (relational). ­ DOI: 10.4324/9781003392972-10 120

Experimental studies in discourse

This illustrates how coherence is of a cognitive nature; the processing and representation of coherence phenomena are often based on linguistic signals in the discourse itself. Both coherence phenomena under consideration – referential and relational coherence – have clear linguistic indicators that can be taken as processing instructions. For referential coherence, these are devices such as pronouns (he) ­ and lexical noun phrases (the ­ guy), and for relational coherence, these are connectives (because) ­ and other lexical markers of relations, such as cue phrases and signalling phrases (as ­ a result of; the reason for X is). A major research issue is the relation between the linguistic surface code and aspects of the discourse representation. In this chapter, we discuss research related to this and other major questions from the field of discourse coherence: how do language users form a coherent representation of a discourse, how does this evolve during online processing, and how is the discourse structure signalled in the linguistic surface code?

8.2

Historical perspectives

Since the 1950s, experimental psychologists have developed techniques that were also put to use in psycholinguistics. One of the first experimental methods that were used to investigate discourse concerns memory: participants were invited to read a text and then asked what they could remember. Such free recall tasks provide insight into readers’ representation of the textual information. Pioneering studies by Bartlett (1932) showed how people recalled different pieces of information from a text, depending on their cultural background and their previous knowledge of the topic. In the 1970s and early 1980s, free recall was often used to investigate whether theories on the relative importance of textual information were correct. Meyer (1975) and colleagues provided hierarchical representations of discourse, predicting that higher-ordered information would be recalled better than less important information: the so-called levels effect. Memory measures like free recall have their shortcomings. First, memory performance does not provide insight into the processes of comprehension; the information it provides is limited to the representation of the information. Second, recall is not a very precise method, because recall protocols need to be analysed and evaluated: which information is or is not recalled? Agreement scores between several judges are necessary and scoring is a very laborious task. Another type of experimental methods that has been widely used since the 1950s is one in which language users’ discourse knowledge is used to test or corroborate linguistic analyses. In such tasks, participants are invited to reflect consciously on (pieces of) discourse. People can be asked to continue a story, and researchers will study their referential patterns (Gernsbacher, 1989). Intuitive knowledge on coherence can be investigated when people participate in card-sorting tasks (going back to Miller, 1969) and put intuitively similar coherence relations on the same pile (Sanders et al., 1992). Such techniques were refined in recent years, making use of digital platforms and crowdsourcing, as we will see in Section 8.4. Halfway through the 1970s, more and more online measures were used, which could overcome certain shortcomings of offline memory measures. Online measures focus on the time course of comprehension. A straightforward method is measuring reading times. This can be done, for example, in a ­self-paced reading paradigm: participants are invited to read a text on the computer ​­ screen and to push a button when they have read a segment, which can consist of individual words or clusters of words. The computer registers the reading time per segment. Differences in reading times are interpreted as an indicator of the cognitive effort that readers need to process the information; the longer readers need, the more cognitive effort they need to process the linguistic items. 121

Ted J. M. Sanders et al.

Larger efforts are often correlated with more complexity in the textual information. Illustrative results include the finding that textual information that is important in terms of the hierarchical text structure is processed slower than when this same information is less important (Cirilo & Foss, 1980). For relational coherence, the finding that causally related information is processed faster than non- or less-causally related sentences is key (Myers et al., 1987). Since the 1980s, such online measures have been developed further. A sophisticated technique for charting the reading process is measurement of people’s eye fixations (Just & Carpenter, 1980). Infrared cameras provide information on the fixations, regression, and saccades that readers make while reading. This type of eye-tracking ­ ​­ research has provided us with many precise insights in the comprehension process. During fixations, information is processed. During saccades – short rapid movements – vision is suppressed so that information cannot be processed. Regressions are a specific type of saccades; those that jump backward through the text. When researchers compare text versions in an eye-tracking experiment, they focus on fixations and regressions. The longer readers fixate on a segment, the more effort they need to process that information. This can be because it is complex, or because it is important. The frequency of regressions as well as their location provides information on processing. Illuminating patterns in discourse processing found with eye-tracking include regressions showing exactly how readers process referential coherence. While the development of more precise investigations of the online comprehension processes has been crucial to gaining a deeper understanding of discourse comprehension, gaining this understanding still requires the combination of on- and off-line measures. For instance: information is read slower and recalled better when it is important in the discourse compared to when it is less important, suggesting that readers spend more time encoding important information, which results in better representation of this information as becomes evident from better recall scores. So far, we have identified three types of research paradigms, providing information on different aspects of discourse: processing, representation, and analysis. In the remainder of this chapter, we will illustrate the use of these paradigm methods in investigating relational and referential coherence.

8.3

Critical issues and topics

We now turn to a discussion of critical topics that have been studied in the field, especially since the 1990s, highlighting which methods are typically used for these investigations. We focus our discussion on relational and referential discourse research separately and discuss the topics in these fields with respect to three paradigms: processing, representation, and analysis.

8.3.1

Relational coherence

8.3.1.1

Processing

The representation that people make of a discourse is conceptual by nature and can but need not be linguistically marked by connectives and cue phrases. These linguistic elements are seen as processing instructions that inform readers on how to connect the incoming discourse segments: for example, the usage of because in a sentence makes clear we are dealing with a consequencecause relation and not with a contrastive or additive relation. Ever since Haberlandt (1982), there is massive evidence for the so-called integration effect: the information that directly follows the connective or cue phrase is processed faster than the same information without the connective, indicating that connectives facilitate the integration of information (see, e.g., Millis & Just, 1994; 122

Experimental studies in discourse

Murray, 1997; Sanders & Noordman, 2000; Traxler et al., 1997). ­Self-paced ​­ reading was used in these experiments. In the 1990s, there was still some debate on when exactly this integration effect would take place: immediately or towards the end of the sentence? The use of ­eye-tracking ​­ methods (Rayner, 2012) has allowed researchers to be more precise about this type of effect. Virtually all relevant studies with these methods have shown that the integration effect shows up immediately after the connective, while some have found that towards the end of the sentence, there is a slowdown because of the connective. This effect has been interpreted as reflecting the effort readers make to verify the established relation in light of their world knowledge. Recent eye-tracking studies have also provided more detailed information on the exact nature of the integration process. Using educational texts, van Silfhout et al. (2015) compared texts with and without connectives and found that the connective caused readers to make more regressions (looking back) from the first region after the connective to previous information. When participants went on and reached the information following the connective again, they read this faster than in the implicit version. This type of result illustrates the benefits of eye-tracking studies: they provide rather precise information on online processing. This also holds for relatively subtle linguistically coded differences. For instance, experiments have shown that lexical markers play a significant role in interpreting two different types of causality, serving as processing instructions. So-called subjective relations (expressing the speaker’s reasoning) are harder to process than objective causal relations (describing relations in the real world). This has been shown for English (Traxler et al., 1997), Dutch (Canestrelli et al., 2013) and French (Zufferey et al., 2018). Experiments with a Visual World Paradigm (Wei et al., 2019) have shown that the use of specialized subjective connectives, cf. English therefore, indeed allow addressees to make predictions about the upcoming discourse. Neurological measures such as ­Event-Related ​­ Potentials (ERP) provide a clearer view of the neural processes that underlie reading. Two ERP-components are particularly relevant: the N400 and the P600. The N400 reflects the predictability of a word in its sentential (e.g., Kutas & Hillyard, 1984) or discourse (e.g., Federmeier & Kutas, 1999; van Berkum et al., 2005) context. The P600 is associated with semantic/pragmatic violations (e.g., Kuperberg et al., 2003; van Herten et al., 2005). For example, Köhne-Fuetterer et al. (2021) conducted an ERP-experiment to obtain further insight into how and when readers use connectives during discourse processing. They found a P600 effect indicating that, upon encountering a concessive connective (like although), readers immediately update their mental representation from an expected causal relation to an unexpected concessive relation.

8.3.1.2

Representation

As we have seen so far, there is substantial evidence that (at least certain types of) connectives and cue phrases speed up the processing of subsequent information. Still, one could question the effect of faster processing on the representation of the information. After all, if a given relation is already made clear by overt marking, the language processor must do less work, which may result in a sloppier representation. Which methods are available to investigate the representation people have made after they have read, listened to, or participated in a discourse? Pioneering studies on the effect of text structure in the 1970s often used free recall as a dependent variable (Meyer, 1975). Contrary to expectations, connectives and other coherence markers did not affect the amount of information recalled. It looks like this method is not sensitive enough to tap into the representation of a text and pick up effects of coherence marking. In fact, experiments that have applied the free recall method have failed to report effects of relational marking 123

Ted J. M. Sanders et al.

on comprehension (Meyer, 1975; Sanders & Noordman, 2000). On the other hand, other experimental methods have revealed that overt marking of coherence relations might improve the mental representation of a text. It has been shown to lead to more complete summaries (Hyönä & Lorch, 2004; Lorch, 2001), faster responses on verification tasks (Millis & Just, 1994; Sanders & Noordman, 2000) and an overall higher quality of recalled information (Meyer et al., 1980). Question answering seems a more suitable test to assess text comprehension than memory measures do. Inference questions, often focused on the relation under investigation, do show how linguistic marking leads to more accurate answers (Degand & Sanders, 2002; McNamara, 2001; van Silfhout et al., 2014, 2015). Another method investigating comprehension, which was often criticized in the 1970s and 1980s, was recently revitalized. In a cloze test, participants are invited to fill in gaps in a text. This method was used by Kleijn et al. (2019), who conducted a comprehension study comparing implicit and explicit (connective) versions. They found that contrastive and causal connectives facilitated comprehension, while additive connectives even reduced comprehension. The conclusion is that effects of connectives on text comprehension may be consistent between readers, but not between types of coherence relations, or types of linguistic cues. Keyword sorting tasks, specifically designed to tap into the situation model representation (Kintsch, 1998), were often used in studying comprehension and learning from text, too: participants are asked to sort crucial concepts from the text in predefined categories that either be open or that have received a name (McNamara & Kintsch, 1996; McNamara et al., 1996). Results indicate that readers scored higher on sorting tasks when they had read a text with coherence marking than an implicit version (Land, 2009). Overall, results do not always point into the direction of a positive effect of coherence markers. A crucial question is: can we generalize over a large variety of coherence markers and coherence relations? The answer is: no, we need to differentiate between various types of relations, such as additive, causal, and contrastive relations. Crucial in doing so is determining which types of relations language users distinguish. In other words: are the categories that we, as linguists, have come up with actually cognitively plausible?

8.3.1.3 Analysis: Various types of relations and markers Which types of relations and connectives can be distinguished? This question is seriously debated and investigated in text linguistics, pragmatics, and discourse studies. Based on classical taxonomies of coherence relations (Halliday & Hasan, 1976; Sanders et al., 1992), we could focus on main categories like additive (2), temporal (3), contrastive (4) and causal relations (5); all relations are illustrated here with their prototypical connectives. 2 3 4 5

Daan worked on his paper and Willem played his guitar. Jan cooked supper. Afterwards, Jip did the dishes. Nala likes to play tennis, but Jip prefers soccer. Jip started crying because Nala stole his car.

Results from theoretical analyses, corpus studies and experiments have shown that adult language users intuitively distinguish between these relational clusters (see Sanders et al., 2021 for an elaborate discussion) and know which connectives to use to express each type of relation. In addition to corpus work, several experimental methods have appeared useful in arriving at this conclusion, such as classification tasks (Sanders et al., 1992) and relation labeling tasks (e.g., Scholman et al., 2016). Analyses of the ‘mistakes’ that analysts make are also informative; for example, people mix 124

Experimental studies in discourse

up theoretically related labels much more often than less related labels (Sanders et al., 1992, 1993). The disadvantage of such methods is that language users very consciously focus on the relations and must be trained to master the right meta-level terminology: the use of the right relation labels. Another method to obtain discourse relation interpretations is to make use of relational markers instead of relation labels, thereby exploiting the natural relation lexicon that people use in everyday language (Knott & Dale, 1994). A typical task falls in the insertion paradigm: given two discourse segments, participants are asked to fill in the best fitting connective from a presented list of alternatives (Sanders et al., 1992). This methodology has been applied recently to obtain discourse-annotated data using untrained, crowdsourced participants (Scholman et al., 2022). Coherence relation categorizations can be derived from the connectives that participants have provided to express two relational segments, thereby giving insight into the participants’ mental representation of the relation without requiring meta-level terminology.

8.3.2

Referential coherence

Even though there are clear parallels between relational and referential coherence, there is a crucial difference, too: while each individual coherence relation occurs only once, a referent can be referred to multiple times throughout a discourse. This is why, for referential coherence, the three ‘levels’ we distinguish (representation, processing, analysis), are more intertwined.

8.3.2.1 Analysis: Different types of referring expressions Accessibility appears to be a crucial factor in the use of referring expressions (REs; see Arnold, 2010, for an extended discussion): when a referent is not accessible (salient; activated) in the discourse, a longer linguistic form (e.g., a full NP) tends to be used to refer to that referent; when a referent is highly accessible, they are generally referred to using a shorter form (for instance, a personal pronoun or a null anaphor). Common experimental methods aimed at investigating the production of REs include sentence completion or story continuation, where participants are presented with a prompt and asked to write down or utter the first continuation that comes to their mind. A much more restricted variation of this task is insertion, where only the RE is left out of a sentence or story, and participants are asked to fill in the blank or, in case of the forced choice version of this task, to select an RE from a list of options. Studies using these paradigms have identified factors that contribute to a referent’s accessibility, such as: • Topicality: topical referents are more often referred to using a reduced RE than non-topical referents (e.g., Cowles & Ferreira, 2012) • Grammatical role: subjects are more often referred to using a reduced RE than non-subjects (e.g., Arnold, 2001; Fukumura & van Gompel, 2010; Stevenson et al., 1994) • Thematic role: e.g., goals are more often referred to using a reduced RE than sources (Arnold, 2001) • Animacy: animate referents are more often referred to using a reduced RE than inanimate referents (Fukumura & van Gompel, 2011; Vogels et al., 2014) • The number of other discourse referents: higher pronominalization rates in contexts with fewer referents (Arnold & Griffin, 2007). Other factors have been proposed to influence choice of RE as well. Fukumura et al. (2022), for instance, use a spoken production task where participants must describe the location of an entity in 125

Ted J. M. Sanders et al.

a visual display to an interlocutor to investigate the effect of ambiguity. While this study, like many other studies, do find that people prefer to produce more unambiguous than ambiguous pronouns, the mechanism behind why this is the case is still a topic of discussion (see Hoek et al., 2021 for a discussion). Similarly, an open question is whether the predictability of referents influences pronominalization rates. Fukumura and van Gompel (2010), for instance, find no effect of predictability on choice of RE using a version of the sentence completion task where participants are explicitly asked, in this case by pointing an arrow at one of the referents, to focus their continuation on a specific referent from the preceding discourse. Rosa and Arnold (2017), on the other hand, adapt the standard story continuation method to include a rich narrative and visual context and do find an effect of predictability. This suggests that the selected methodology may impact the obtained results.

8.3.2.2

Representation: Interpreting referring expressions and determining a referents’ status

When encountering an RE, language users are faced with resolving it to one of the discourse referents. The RE may provide explicit information that helps limit the number of candidate referents, such as person, number, or gender. This can, but does not always, result in an unambiguous RE. In case of ambiguity, resolving the RE to a referent is guided by other factors. While it may seem intuitive that REs would be interpreted in the same way as they are produced, there are indications that this may not be the case. Stevenson et al. (1994) were the first to use the sentence completion paradigm with both free prompts and pronoun prompts. They found that the number of continuations that focused on the subject of the previous clause was significantly higher in the free prompt than in the pronoun prompt condition, a result that has been replicated numerous times since. Which factors drive the interpretation of REs is, therefore, a question distinct from the question which factors drive pronoun production, although insights may very well transfer. Kehler and Rohde (2013), using the same experimental method as Stevenson et al. (1994), argue that meaning-related factors (e.g., thematic role, verb semantics) drive the interpretation of REs more than the production of REs. Prompts in sentence completion/story continuation tasks may, in addition to just personal pronouns like in the studies mentioned above, also contain other forms of REs, such as demonstratives (e.g., Bader & Portele, 2019), to test the effect of form of mention. However, where these tasks require the researcher to interpret the target referent of the participant’s RE, there are other tasks that can tap into pronoun interpretations more directly. Participants can, for instance, be presented with a short discourse containing an ambiguous pronoun, after which they are explicitly asked who the pronoun refers to. Many studies using this forced choice paradigm use a nonce word in each item to avoid any information that may bias the interpretation of the pronoun beyond the factors being tested (e.g., Hartshorne & Snedeker, 2013). Research testing the interpretation of REs can also make use of visual displays that may or may not correspond to the situation described in the verbal prompts. Participants can then, for instance, be asked to indicate whether the picture matches the prompt in a visual verification task (e.g., Experiment 1 in Kaiser et al., 2009), or to choose which picture best represents what is described in the discourse (e.g., Experiment 2 in Kaiser et al., 2009). In addition, participants can be instructed to actively engage with the referents, in either a real-life or digital setting. The study in Brown-Schmidt et al. (2005), for instance, asks participants to “Put the cup on the saucer. Now put it/that over by the lamp.” They then determine ­ whether the pronoun it or that is interpreted as referring to just the cup or to the composite of the cup plus saucer. Because the instructions are spoken, the experiment can, in addition to varying the specific RE, assess the role of prosody in pronoun resolution. 126

Experimental studies in discourse

Because referents can be referred to multiple times throughout a discourse, the status of referents in a discourse is dynamic, with referents’ activation levels fluctuating because of, for instance, recency of mention, topic status, and form of mention. Interpreting a RE thus affects the status of referents.

8.3.3

Processing

While the methods discussed in 8.3.2.2 measure offline how REs are interpreted, online studies are needed to assess how these processes play out during real-time processing. Studies using methods such as a reaction time paradigm, self-paced ­ ​­ reading and ­eye-tracking, ​­ for instance, show that it is less costly to resolve unambiguous REs than ambiguous ones (e.g., MacDonald & MacWhinney, 1990; Stewart et al., 2000), less costly to resolve REs with a recent antecedent than REs referring to a more distant referent (e.g., Ehrlich & Rayner, 1983; Patterson, 2013) and less costly to resolve REs referring to topical referents than to non-topical referents (e.g., Clifton & Ferreira, 1987; Gordon & Scearce, 1995). In addition, online experiments have established that language users do not always passively wait to process whichever RE they encounter but can also actively anticipate upcoming referents. Implicit causality verbs (e.g., Garvey & Caramazza, 1974) have been shown to yield expectations about which referent will be the focus of the subsequent discourse (i.e., bias). In an ­eye-tracking-while-reading study, Koornneef and van Berkum (2006) ­​­­ ­​­­ ​­ show that bias-consistent pronouns are read faster than bias-inconsistent pronouns, which lead to processing delays in various reading measures indicative of early processing delays. In line with these results, the visual world eye-tracking ­ ​­ study by Cozijn et al. (2011) shows that upon hearing an implicit causality verb in the auditory input, participants look towards the bias-consistent referent in the visual display before hearing any disambiguating information. The online studies that are mentioned above all use experimental items containing REs that, expected or not, have a possible antecedent in the discourse. In contrast, other studies investigate REs that, if co-referent with one of the referents from the preceding discourse, constitute a syntactic and/or semantic violation. By comparing ERP responses, Nieuwland (2014) shows how brain activity differs when encountering felicitous versus infelicitous pronouns. Also using sentences containing violations, Hammer et al. (2007) conduct an fMRI study in German. Since German nouns carry syntactic gender that need not match the semantic/biological gender of the lexical item (e.g., ­ das Mädchen ‘the girl’ has neuter syntactic gender), manipulating what kind of violation a subsequent, supposedly co-referent, pronoun constitutes, allows for disentangling the role of syntax and semantics in the processing of REs.

8.4

Current contributions and research

Linguistic experiments were traditionally mainly conducted in laboratory settings. More recently, however, technological developments have made crowdsourcing platforms increasingly popular for obtaining data (see Chapter 25). The advent of crowdsourcing has also facilitated research into individual differences in language production and comprehension. In the field of discourse, early work has repeatedly shown that language exposure affects language comprehension at the discourse level. In a story continuation study, Scholman et al. (2020) found that comprehenders’ sensitivity to a context-based cue for list relations (e.g., several, a few) depends on their print exposure. Zufferey and Gygax (2020) found in two connective judgments tasks that people’s ability to understand correct and incorrect usages of connectives depends on their print exposure, as well as their grammatical competence. Finally, results from two insertion tasks conducted by Tskhovrebova et al. (2022) indicate that not only participants’ exposure to print and grammatical 127

Ted J. M. Sanders et al.

competence but also their academic background can predict connective comprehension. Regarding individual differences in referential processing, Arnold et al. (2018) find that pronoun comprehension can be explained by print exposure, but not working memory or theory of mind. Langlois and Arnold (2020) find that syntactic biases in pronoun comprehension can also be explained by print exposure, but semantic biases cannot. In a non-crowdsourced experiment aimed at investigating the effect of age on pronoun production, Hendriks et al. (2014) show that children (aged 4–7) and elderly adults (aged 69–87) produce more ambiguous pronouns than young adults (18–35). These recent studies demonstrate the importance of considering individual differences in discourse interpretation and processing in future work. Related to differences between comprehenders, we note that studies in the field of discourse processing have also traditionally focused mainly on native speakers. Non-native discourse processing is under research both with regards to relational and referential coherence. The limited work that is available suggests that native and non-native speakers differ greatly in their connective knowledge and usage (e.g., Leedham & Cai, 2013; Wetzel et al., 2020), which in turn affects their discourse processing skills (Crible et al., 2021; Zufferey & Gygax, 2017). Additionally, nonnative speakers do not show the same proactive coreference expectation mechanisms as native speakers during discourse processing (Grüter & Rohde, 2021). Finally, while language processing has been found to be shaped by language-specific properties, most work investigating discourse processing focuses on a single language, rather than on cross-linguistic comparisons between multiple languages. In referential discourse studies, a significant body of work has focused on a variety of languages and on cross-linguistic comparisons; see De la Fuente et al. (2016) for an overview. The field of relational discourse is largely dominated by a focus on a limited set of languages: English, Dutch, French and German. Accordingly, the field is at risk of missing properties of discourse that are common in a variety of languages but are less common or even absent in (typologically) different languages. Recent years have witnessed studies on Mandarin Chinese (Li et al., 2017; Wei et al., 2019; Xiao et al., 2021), and Spanish (Santana et al., 2018), to broaden the empirical basis. The body of literature on discourse comprehension and interpretation in languages other than English, as well as cross-linguistic comparisons, is now steadily growing. For example, Yi and Koenig (2021) showed that following implicit causality verbs, Korean speakers are less likely to produce explanation continuations in monologues than English speakers, which can be explained by constraints on clause linkage in Korean. The coreference biases associated with implicit causality verbs, on the other hand, were the same as in English. We see the area of cross-linguistic investigations into discourse production and comprehension as a fruitful topic for future research.

8.5

Main research methods

In this section, we list the key methodologies used in discourse research, in alphabetical order. For each method, we provide a short description and, if applicable, list important variants of the method. In addition, we point out some considerations in choosing this method. We have included references to studies that have used the method; for more elaborate examples of how a specific method can be used to further our understanding of discourse, we refer to Sections 8.2–8.4.

8.5.1  Classification and labelling tasks In a classification task, participants are asked to categorize stimuli into groups based on similarity, for instance, coherence relations (e.g., Sanders et al., 1992). In a labelling task, participants are 128

Experimental studies in discourse

presented with a prompt and asked to choose a description that most fittingly describes the prompt (very similar to forced choice interpretation task, but with a set of labels that is fixed throughout the experiment, and often explained in the instructions e.g., Scholman et al., 2016). Considerations: These meta-linguistic tasks ask participants to engage very actively with the prompts, and, as such, may not yield results that reflect what happens when people encounter language in a natural, no-pressure ­ ​­ context.

8.5.2

Comprehension questions

Participants are presented with a prompt and asked to answer comprehension questions targeting their representation or interpretation of the prompt (e.g., McNamara et al., 1996). Comprehension questions can consist of, e.g., elaborative or bridging inference questions (requiring participants to make connections not explicitly stated in the text) or keyword sorting task questions (requiring participants to organize a set of key concepts into categories). Considerations: Comprehension questions tap into how well readers have understood the situation described by the text. Note that the effect of coherence marking (both connectives and REs) on the mental representation after reading a text is not unequivocal; for example, some studies have shown a beneficial effect of coherence markers on text comprehension questions (Degand & Sanders, 2002), while others did not find any effect (Spyridakis & Standal, 1987). These varying results might be due to differences in which level of text comprehension the questions target; manipulating whether a relation is or is not explicitly marked by a connective, for instance, may not affect a reader’s comprehension of the entire discourse (see also Kleijn et al., 2019).

8.5.3

Continuation tasks

Participants are presented with a prompt and asked to provide a natural continuation (e.g., Kehler et al., 2008). This allows researchers to study both the content and the form of the continuation. The ending can consist either of a full stop or an ambiguous element. Considerations: This paradigm requires researchers to manually annotate the continuations for the phenomenon under investigation, which allows room for subjective interpretations. To ensure that the annotations are reliable, at least a portion of the data should be double coded by two coders. These annotations can then be used to calculate inter-annotator agreement (see Spooren & Degand, 2010).

8.5.4 

Directive and ­act-out ​­ tasks

In a directive study, participants are asked to give directives to another entity (e.g., a puppet or another participant). In an act-out task, participants listen to a prompt and are asked to act out the events in the prompt using tools (e.g., puppets). Considerations: Given the playful nature of these paradigms and the fact that they typically do not require advanced verbal or reading skills, these methodologies are suitable for child language acquisition research (e.g., Evers-Vermeul & Sanders, 2011; Papakonstantinou, 2015) but can also be applied in studies using an adult sample (e.g., Fukumura et al., 2011; Tourtouri et al., 2019). Responses 129

Ted J. M. Sanders et al.

need to be recorded and manually coded using a predefined coding procedure, which may include determining inter-coder reliability (see also Continuation tasks).

8.5.5 

ERP/EEG ­

Participants are presented with a prompt while their brain waves are being measured through electrodes. Considerations: ERP/EEG studies provide insight into activity in the brain as it unfolds in real time, but they are expensive to run and require advanced equipment, which not every researcher might have access to.

8.5.6 ­Eye-tracking: ​­ Visual world paradigm Participants listen to a prompt and simultaneously view a screen with multiple images, while their eye-movements are being recorded through a special camera that tracks the pupils and measures gaze direction (e.g., Kaiser et al., 2009; Köhne-Fuetterer et al., 2021). Considerations: An advantage of this method is that it does not require any specific task from the participant; people will just listen to input and unconsciously their eyes will go the related picture. One disadvantage of this method is that the visual world paradigm primes participants to generate expectations by introducing referents. Other paradigms in the discourse processing field do not have this drawback.

8.5.7 ­Eye-tracking-while-reading ­​­­ ­​­­ ​­ Participants read prompts while their eye-movements are being recorded (e.g., Koornneef & van Berkum, 2006; van Silfhout et al., 2015). Considerations: This method provides very fine-grained insight into the reading process (see for instance Rayner [2012] for a discussion, and Chapter 17, this volume).

8.5.8

fMRI

Participants’ local changes in cerebral blood oxygenation are measured (e.g., Nieuwland and van Berkum, 2006; Xiang & Kuperberg, 2015). Considerations: fMRI studies provide insight into activity in the brain as it unfolds in real time, but fMRI studies are expensive to run and require advanced equipment, which not every researcher might have access to.

8.5.9

Forced choice interpretation task

Participants are presented with a prompt and asked to choose from multiple visual or verbal representations (e.g., images or text options) which option presents the intended meaning of the prompt or the intended referent. In addition to choice selection as dependent variable, response times may be considered. 130

Experimental studies in discourse

Considerations: The advantage of a forced choice design is that it forces participants to be explicit about their interpretation of the phenomenon under investigation. However, the choice options should be carefully designed: for instance, one option should not be more salient than the other because of factors unrelated to the research question at hand.

8.5.10

Image description

Participants are presented with one or more images and asked to describe the image in their own words (e.g., Geelhand et al., 2020). Considerations: This methodology allows researchers to study natural language production in a controlled context. However, responses need to be manually coded for the phenomena of interest, which requires a detailed coding procedure and an analysis of inter-annotator agreement (see also Continuation tasks).

8.5.11

Insertion

Participants are presented with a prompt and asked to insert a linguistic element, such as a connective or RE (sometimes also referred to as a cloze task). The choice can be free or forced. Considerations: Insertion methodologies provide insight into how well participants understand the meaning and usage of specific words (e.g., do they understand which connective should be used in a specific context?), or it can give insight into their representation of the text (e.g., the connective that participants use reflects the coherence relation they inferred; the RE that they choose reflects how they interpreted the actions and referents in the prompt). The forced choice paradigm is beneficial when studying specific or less frequent phenomena. The free insertion paradigm is suitable for giving insight into relatively natural language usage.

8.5.12

Judgment tasks

Participants are presented with a prompt and asked to judge the prompt regarding a particular quality, for example, how acceptable, plausible, logical, coherent, or grammatical the prompt is. Participants can be asked to provide judgments on a binary scale (e.g., same meaning versus different meaning, as in Crible & Demberg, 2020), or on a graded scale (e.g., on a scale of 1–5, rate how much the sentence makes sense, as in Cain & Nash, 2011). Considerations: For judgment tasks, it is particularly important that the instructions are sufficiently clear; participants might differ in their considerations of what constitutes a coherent sentence, for example. Clear examples in the instructions can help, as well as fillers that clearly fall into either of the two binary categories or, in case of a graded scale, on the extreme ends of the scale.

8.5.13

Recall questions

Participants are presented with prompts and asked to recall specific elements of those prompts afterwards (e.g., Jou & Harris, 1990; Sanders & Noordman, 2000). Considerations: 131

Ted J. M. Sanders et al.

Recall questions test how well participants remember the actual (surface) content of the original text. For testing a more elaborate understanding of the discourse, comprehension questions might be more suitable.

8.5.14 ­Self-paced ​­ reading Participants are presented with prompts in masked or hidden “chunks” (words or multi-word regions), which participants reveal by pressing a button (e.g., Koornneef & van Berkum, 2006; Lyu et al., 2020; Wetzel et al., 2022). Considerations: This method can be used to study immediate effects of connectives or REs on discourse processing but provides relatively coarse-grained insights. Because this method does not require the usage of special cameras, self-paced reading studies are a common method of crowdsourcing online processing data.

8.5.15

Summarization and free recall

Participants read a prompt and are afterwards asked to provide a summary, or as many details as they can remember about the text. Considerations: All responses must be analysed and scored, which can be a very time-intensive process. In addition, all or part of the data may have to be double coded to ensure reliability (see also Continuation tasks). Summarization is more informative when it comes to participants’ mental representation of the discourse; free recall speaks more to participants’ recollection of surface features of the text.

8.6

Recommendations for practice

Designing a study does not stop at choosing the appropriate method to answer the research question and creating experimental items. There are various other design issues that deserve careful consideration. One facet of experimental design that can greatly affect the results is participants’ (subconscious) awareness of the phenomenon under investigation. It is crucial to many experimental studies that participants stay unaware to the purpose of the experiment. This holds particularly for processing experiments, as prior studies have shown that participants rapidly adapt to the structure of the linguistic stimuli (e.g., Fine et al., 2013). To minimize the effects of repeated exposure to experimental items, researchers need to ensure that each participant sees every item only once in one condition (i.e., they cannot see every item in every condition), and the proportion of experimental items to filler items needs to be chosen carefully. In addition, the data can be checked for any learning effects during statistical analysis. Since crowdsourcing allows for the recruitment of a larger number of participants, participants can be presented with a limited number of experimental items, to avoid frequent exposure. However, note that it is important to have enough items in the full set of materials, because repeated measures increase the reliability and generalizability of a study’s results. Another critical facet of experimental research is choosing what type of and how many participants will be tested. The group of participants included in experimental studies should ideally be a representative sample of the population. A related question is how many participants should be sampled, or how much data is enough (see Chapter 24). 132

Experimental studies in discourse

Finally, researchers are encouraged to preregister their study online. Preregistration is an open science practice that requires the specification of the study’s hypotheses and planned analyses before the data are inspected. It thereby prevents “p-hacking” or “selective reporting”, a practice whereby researchers try out several statistical analyses and/or data eligibility specifications and then selectively report those procedures that produce significant results. It also reduces the publication bias (i.e., the tendency for statistically significant findings to be published more than non-significant findings), and thereby reduces the importance of significance tests in publication decisions. We refer interested readers to Roettger (2021), who provides more details on preregistration in experimental linguistics.

8.7

Future directions

Throughout the previous sections, we have touched on fruitful directions for future research. We here reiterate these topics: • Individual differences in discourse comprehension and production, also related to language learning • Multi-modal investigations of referential and relational coherence • Replicating effects found in one paradigm using converging methodologies, experimental but also in corpus research • Extending discourse research to understudied languages and comparing phenomena between languages In addition, the following topics deserve more consideration in future research: • Discourse in the spoken domain; both in terms of discourse production and comprehension • Ecological validity of results obtained using carefully constructed linguistic items • Readers’ motivations and the effects on experimentally obtained results of discourse processing and comprehension

Acknowledgements TS is supported by Utrecht University, JH by Nijmegen University, and MS is supported by the European Research Council (ERC) under the European Union’s Horizon 2020 Research and Innovation Programme, Grant 948878 (“Individualized Interaction in Discourse”).

Further reading Scholman, M. C. J., Demberg, V., & Sanders, T. J. M. (2022). Descriptively adequate and cognitively plausi­ ​­ ble? Validating distinctions between types of coherence relations. Discours, 30, 1–32. Zwaan, R. A., & Rapp, D. N. (2006). Discourse comprehension. In Handbook of Psycholinguistics (pp. ­­  ­725–​ ­764). Academic Press.

Related topics Analysing reading with eye-tracking; analysing spoken language interactions with eye-tracking; analysing language using brain imaging; testing in the lab and through the web; eliciting spontaneous linguistic productions; contrasting online and offline measures in experimental linguistics; new directions in statistical analysis for experimental linguistics

133

Ted J. M. Sanders et al.

References Arnold, J. E. (2001). The effect of thematic roles on pronoun use and frequency of reference continuation. Discourse Processes, 31(2), https://doi.org/10.1207/S15326950DP3102_02. ­ ­137–162. ​­ ­ ­ ­ Arnold, J. E. (2010). How speakers refer: The role of accessibility: How speakers refer. Language and Linguistics Compass, 4(4), https://doi.org/10.1111/j.1749-818X.2010.00193.x. ­ ­187–203. ​­ ­ ­ ­ ­ ​­ Arnold, J. E., Strangmann, I. M., Hwang, H., Zerkle, S., & Nappa, R. (2018). Linguistic experience affects pronoun interpretation. Journal of Memory and Language, 102, 41–54. ­ ​­ https://doi.org/10.1016/j.jml.2018. ­ ­ ­ 05.002. Arnold, J. E., & Griffin, Z. M. (2007). The effect of additional characters on choice of referring expression: Everyone counts, Journal of Memory and Language, 56(4), https://doi.org/10.1016/j.jml.2006. ­ ­521–536. ​­ ­ ­ ­ 09.007. Bader, M., & Portele, Y. (2019). The interpretation of German personal pronouns and d-pronouns. Zeitschrift für Sprachwissenschaft, 38(2), https://doi.org/10.1515/zfs-2019-2002. ­ ­155–190. ​­ ­ ­ ­­ ­​­­ ​­ Bartlett, S. F. C. (1932) Remembering: A Study in Experimental and Social Psychology. Cambridge University Press. Van Berkum, J. J., Brown, C. M., Zwitserlood, P., Kooijman, V., & Hagoort, P. (2005). Anticipating upcoming words in discourse: Evidence from ERPs and reading times. Journal of Experimental Psychology: Learning, Memory, and Cognition, 31(3), https://doi.org/10.1037/0278-7393.31.3.443. ­ ­443–467. ​­ ­ ­ ­­ ​­ Brown-Schmidt, S., Byron, D. K., & Tanenhaus, M. K. (2005). Beyond salience: Interpretation of personal and demonstrative pronouns. Journal of Memory and Language, 53(2), https://doi.org/10.1016/j. ­ 292–313. ­ ​­ ­ ­ ­ jml.2005.03.003. Cain, K., & Nash, H. M. (2011). The influence of connectives on young readers’ processing and comprehension of text. Journal of Educational Psychology, 103(2), https://doi.org/10.1037/a0022824. ­ ­429–441. ​­ ­ ­ ­ Canestrelli, A. R., Mak, W. M., & Sanders, T. J. M. (2013). Causal connectives in discourse processing: How differences in subjectivity are reflected in eye movements, Language and Cognitive Processes, 28(9), ­ https://doi.org/10.1080/01690965.2012.685885. ­1394–1413. ​­ ­ ­ ­ Cirilo, R. K., & Foss, D. J. (1980). Text structure and reading time for sentences. Journal of Verbal Learning and Verbal Behavior, 19(1), https://doi.org/10.1016/S0022-5371(80)90560-5. ­ ­96–109. ​­ ­ ­ ­­ ​­ ­ ­­ ​­ Clifton, C., & Ferreira, F. (1987). Discourse structures and anaphora: Some experimental results. In M. Coltheart (Ed.), Attention and performance XII. Lawrence Erlbaum, 635–653. ­ ­ ​­ Cowles, H. W., & Ferreira, V. S. (2012). The influence of topic status on written and spoken sentence production, Discourse Processes, 49(1), https://doi.org/10.1080/0163853X.2011.635989. ­ ­1–28. ​­ ­ ­ ­ Cozijn, R., Commandeur, E., Vonk, W., & Noordman, L. G. (2011). The time course of the use of implicit causality information in the processing of pronouns: A visual world paradigm study. Journal of Memory and Language, 64(4), https://doi.org/10.1016/j.jml.2011.01.001. ­ ­381–403. ​­ ­ ­ ­ Crible, L., & Demberg, V. (2020). When do we leave discourse relations underspecified? The effect of formality and relation type. Discours, 26. https://doi.org/10.4000/discours.10848. ­ ­ ­ Crible, L., Wetzel, M., & Zufferey, S. (2021). Lexical and structural cues to discourse processing in first and second language. Frontiers in Psychology, 12, 685491. https://doi.org/10.3389/fpsyg.2021.685491. ­ ­ ­ De la Fuente, I., Hemforth, B., Colonna, S., & Schimke, S. (2016). The role of syntax, semantics, and pragmatics in pronoun resolution: A cross-linguistic overview. In A. Holler & K. Suckow (Eds.), Empirical Perspectives on Anaphora Resolution (pp. ­­  ­11–32). ​­ De Gruyter (Linguistische ­ Arbeiten). Degand, L., & Sanders, T. J. M. (2002). The impact of relational markers on expository text comprehension in L1 and L2. Reading and Writing, 15(7/8), ­ ­ 739–757. ­ ​­ https://doi.org/10.1023/A:1020932715838. ­ ­ ­ Ehrlich, K., & Rayner, K. (1983). Pronoun assignment and semantic integration during reading: Eye movements and immediacy of processing, Journal of Verbal Learning and Verbal Behavior, 22(1), ­ 75–87. ­ ​­ https://doi.org/10.1016/S0022-5371(83)80007-3. ­ ­ ­­ ​­ ­ ­­ ​­ Evers-Vermeul, J., & Sanders, T. (2011). Discovering domains – On the acquisition of causal connectives. Journal of Pragmatics, 43(6), ­ ­1645–1662. ​­ https://doi.org/10.1016/j.pragma.2010.11.015. ­ ­ ­ Federmeier, K. D., & Kutas, M. (1999). A rose by any other name: Long-term memory structure and sentence processing. Journal of Memory and Language, 41(4), ­ ­469–495. ​­ https://doi.org/10.1006/jmla.1999.2660. ­ ­ ­ Fine, A. B., Jaeger, T. F., Farmer, T. A., & Qian, T. (2013). Rapid expectation adaptation during syntactic comprehension. PloS one, 8(10), ­ e77661. https://doi.org/10.1371/journal.pone.0077661. ­ ­ ­ Fukumura, K., Van Gompel, R. P., Harley, T., & Pickering, M. J. (2011). How does similarity-based interference affect the choice of referring expression? Journal of Memory and Language, 65(3), ­ 331–344. ­ ​­ https://­ doi.org/10.1016/j.jml.2011.06.001. ­ ­

134

Experimental studies in discourse Fukumura, K., & van Gompel, R. P. G. (2010). Choosing anaphoric expressions: Do people take into account likelihood of reference? Journal of Memory and Language, 62(1), ­ 52–66. ­ ​­ https://doi.org/10.1016/j. ­ ­ ­ jml.2009.09.001. Fukumura, K., & van Gompel, R. P. G. (2011). The effect of animacy on the choice of referring expression. Language and Cognitive Processes, 26(10), 1472–1504. https://doi.org/10.1080/01690965.2010.506444. ­ ­ ​­ ­ ­ ­ Fukumura, K., Pozniak, C., & Alario, F.-X. (2022). Avoiding gender ambiguous pronouns in French. Cognition, 218, 104909. https://doi.org/10.1016/j.cognition.2021.104909. ­ ­ ­ Garvey, C., & Caramazza, A. (1974). Implicit causality in verbs. Linguistic Inquiry, 5(3), ­ 459–464. ­ ​­ Geelhand, P., Papastamou, F., Deliens, G., & Kissine, M. (2020). Narrative production in autistic adults: A systematic analysis of the microstructure, macrostructure and internal state language. Journal of Pragmatics, 164, 57–81. ­ ​­ https://doi.org/10.1016/j.pragma.2020.04.014. ­ ­ ­ Gernsbacher, M. A. (1989). Mechanisms that improve referential access. Cognition, 32(2), https://­ ­ 99–156. ­ ​­ doi.org/10.1016/0010-0277(89)90001-2. ­ ­­ ​­ ­ ­­ ​­ Gordon, P. C., & Scearce, K. A. (1995). Pronominalization and discourse coherence, discourse structure and pronoun interpretation. Memory & Cognition, 23(3), https://doi.org/10.3758/BF03197233. ­ 313–323. ­ ​­ ­ ­ ­ Graesser, A. C., Millis, K. K., & Zwaan, R. A. (1997). Discourse comprehension. Annual Review of Psychology, 48(1), https://doi.org/10.1146/annurev.psych.48.1.163. ­ 163–189. ­ ​­ ­ ­ ­ Grüter, T., & Rohde, H. (2021). Limits on expectation-based processing: Use of grammatical aspect for coreference in L2. Applied Psycholinguistics, 42(1), ­ 51–75. ­ ​­ https://doi.org/10.1017/S0142716420000582. ­ ­ ­ Haberlandt, K. (1982). Reader expectations in text comprehension’. In J.-F. Le Ny & W. Kintsch (Eds.), ­­  ­239–249). ​­ ­North-Holland. ​­ ­ ­ ­­ ​­ ­ ­­ ​­ Advances in Psychology (pp. https://doi.org/10.1016/S0166-4115(09)60055-8. Halliday, M., & Hasan, R. (1976) Cohesion in English. Longman. Hammer, A., Goebel, R., Schwarzbach, J., Münte, T. F., & Jansma, B. M. (2007). When sex meets syntactic gender on a neural basis during pronoun processing. Brain Research, 1146, ­185–198. ​­ https://doi.org/10.1016/ ­ ­ ­j.brainres.2006.06.110. Hartshorne, J. K., & Snedeker, J. (2013). Verb argument structure predicts implicit causality: The advantages of finer-grained semantics. Language and Cognitive Processes, 28(10), ­ ­1474–1508. ​­ https://doi. ­ org/10.1080/01690965.2012.689305. ­ ­ Hendriks, P., Koster, C., & Hoeks, J. C. J. (2014). Referential choice across the lifespan: Why children and ­ ­391–407. ​­ elderly adults produce ambiguous pronouns, Language, Cognition and Neuroscience, 29(4), https://doi.org/10.1080/01690965.2013.766356. ­ ­ ­ van Herten, M., Kolk, H. H. J., & Chwilla, D. J. (2005). An ERP study of P600 effects elicited by semantic ­ ­241–255. ​­ ­ ­ ­ anomalies. Cognitive Brain Research, 22(2), https://doi.org/10.1016/j.cogbrainres.2004.09.002. Hoek, J., Kehler, A., & Rohde, H. (2021). Pronominalization and expectations for re-mention: Modeling coreference in contexts with three referents. Frontiers in Communication, 6, 674126. https://doi. org/10.3389/fcomm.2021.674126. ­ ­ Hyönä, J., & Lorch, R. F. (2004). Effects of topic headings on text processing: Evidence from adult readers’ eye fixation patterns. Learning and Instruction, 14(2), ­ ­131–152. ​­ https://doi.org/10.1016/j.learninstruc. ­ ­ ­ 2004.01.001. Jou, J., & Harris, R. J. (1990). Event order versus syntactic structure in recall of adverbial complex sentences, Journal of Psycholinguistic Research, 19(1), ­ ­21–42. ​­ https://doi.org/10.1007/BF01068183. ­ ­ ­ Just, M. A., & Carpenter, P. A. (1980). A theory of reading: From eye fixations to comprehension. Psychologi­ ­329–354. ​­ ­ ­ ­­ ​­ cal Review, 87(4), https://doi.org/10.1037/0033-295X.87.4.329. Kaiser, E., Runner, J. T., Sussman, R. S., & Tanenhaus, M. K. (2009). Structural and semantic constraints on the resolution of pronouns and reflexives. Cognition, 112(1), ­ ­55–80. ​­ https://doi.org/10.1016/j.cognition. ­ ­ ­ 2009.03.010. Kehler, A., Kertz, L., Rohde, H., & Elman, J. L. (2008). Coherence and coreference revisited. Journal of Semantics, 25(1), https://doi.org/10.1093/jos/ffm018. ­ ­1–44. ​­ ­ ­ ­ ­ Kehler, A., & Rohde, H. (2013). A probabilistic reconciliation of coherence-driven and centering-driven theories of pronoun interpretation. Theoretical Linguistics, 39(1–2), ­­ ​­ ­1–37. ​­ https://doi.org/10.1515/tl-2013-0001. ­ ­ ­­ ­​­­ ​­ Kintsch, W. (1998) ­ Comprehension: A Paradigm for Cognition. Cambridge University Press. Kleijn, S., Pander Maat, H., & Sanders, T. (2019). Cloze testing for comprehension assessment: The HyTeC­cloze, Language Testing, 36(4), ­ ­553–572. ​­ https://doi.org/10.1177/0265532219840382. ­ ­ ­ Knott, A., & Dale, R. (1994). Using linguistic phenomena to motivate a set of coherence relations. Discourse Processes, 18(1), ­ ­35–62. ​­ https://doi.org/10.1080/01638539409544883. ­ ­ ­ Köhne-Fuetterer, J., Drenhaus, H., Delogu, F., & Demberg, V. (2021). The online processing of causal and concessive discourse connectives. Linguistics, 59(2), ­ ­417–448. ​­ https://doi.org/10.1515/ling-2021-0011. ­ ­ ­­ ­​­­ ​­

135

Ted J. M. Sanders et al. Koornneef, A., & van Berkum, J. J. A. (2006). On the use of verb-based implicit causality in sentence comprehension: Evidence from self-paced reading and eye tracking, Journal of Memory and Language, 54(4), ­ https://doi.org/10.1016/j.jml.2005.12.003. ­445–465. ​­ ­ ­ ­ Kuperberg, G. R., Sitnikova, T., Caplan, D., & Holcomb, P. J. (2003). Electrophysiological distinctions in processing conceptual relationships within simple sentences. Cognitive Brain Research, 17(1), ­ ­117–129. ​­ https://doi.org/10.1016/S0926-6410(03)00086-7. ­ ­ ­­ ​­ ­ ­­ ​­ Kutas, M., & Hillyard, S. A. (1984). Brain potentials during reading reflect word expectancy and semantic association. Nature, 307(5947), https://doi.org/10.1038/307161a0. ­ ­161–163. ​­ ­ ­ ­ Land, J. F. H. (2009). Zwakke lezers, sterke teksten? Effecten van tekst- en lezerskenmerken op het tekstbegrip en de tekstwaardering van vmbo-leerlingen [Less-skilled readers, well-built texts? Effects of text and reader characteristics on text comprehension and text appreciation]. Letteren Proefschriften. Eburon. Langlois, V. J., & Arnold, J. E. (2020). Print exposure explains individual differences in using syntactic but not semantic cues for pronoun comprehension. Cognition, 197, 104155. https://doi.org/10.1016/j.cognition. ­ ­ ­ 2019.104155. Leedham, M., & Cai, G. (2013). Besides … on the other hand: Using a corpus approach to explore the influence of teaching materials on Chinese students’ use of linking adverbials. Journal of Second Language Writing, 22(4), https://doi.org/10.1016/j.jslw.2013.07.002. ­ ­374–389. ​­ ­ ­ ­ Li, F., Mak, W. M., Evers-Vermeul, J., & Sanders, T. J. (2017). On the online effects of subjectivity encoded in causal connectives. Review of Cognitive Linguistics. Published under the auspices of the Spanish Cognitive Linguistics association, 15(1), https://doi.org/10.1075/rcl.15.1.02li. ­ ­34–57. ​­ ­ ­ ­ Lorch, R. F. (2001). Psychology of macrostructure in discourse comprehension’. In International Encyclopedia of the Social & Behavioral Sciences. Elsevier, ­9122–9125. https://doi.org/10.1016/B0-08-043076-7/01540-0. ​­ ­ ­ ­­ ­​­­ ­​­­ ​­ ­­ ​­ Lyu, S., Tu, J. -Y., & Lin, C. -J. C. (2020). Processing plausibility in concessive and causal relations: Evidence from ­self-paced reading and ­eye-tracking. Discourse Processes, 57(4), https://doi. ​­ ​­ ­ ­320–342. ​­ ­ org/10.1080/0163853X.2019.1680089. ­ ­ MacDonald, M. C., & MacWhinney, B. (1990). Measuring inhibition and facilitation from pronouns. Journal of Memory and Language, 29(4), https://doi.org/10.1016/0749-596X(90)90067-A. ­ ­469–492. ​­ ­ ­ ­­ ​­ ­ ­­ ​­ McNamara, D. S., Kintsch, E., Songer, N. B., & Kintsch, W. (1996). Are good texts always better? Interactions of text coherence, background knowledge, and levels of understanding in learning from text. Cognition and Instruction, 14(1), https://doi.org/10.1207/s1532690xci1401_1. ­ ­1–43. ​­ ­ ­ ­ McNamara, D. S. (2001). Reading both high-coherence and low-coherence texts: Effects of text sequence and prior knowledge. Canadian Journal of Experimental Psychology/Revue canadienne de psychologie expérimentale, 55(1), https://doi.org/10.1037/h0087352. ­ ­51–62. ​­ ­ ­ ­ McNamara, D. S., & Kintsch, W. (1996). Learning from texts: Effects of prior knowledge and text coherence. Discourse Processes, 22(3), https://doi.org/10.1080/01638539609544975. ­ ­247–288. ​­ ­ ­ ­ Meyer, B. J. F. (1975). Identification of the structure of prose and its implications for the study of reading and memory. Journal of Reading Behavior, 7(1), https://doi.org/10.1080/10862967509547120. ­ ­7–47. ​­ ­ ­ ­ Meyer, B. J. F., Brandt, D. M., & Bluth, G. J. (1980). Use of Top-level structure in text: Key for reading comprehension of ­ninth-grade students. Reading Research Quarterly, 16(1), https://doi.org/10.2307/ ​­ ­ ­72–103. ​­ ­ ­ ­ 747349. Miller, G. A. (1969). A psychological method to investigate verbal concepts. Journal of Mathematical Psychology, 6(2), https://doi.org/10.1016/0022-2496(69)90001-7. ­ ­169–191. ​­ ­ ­ ­­ ​­ ­ ­­ ​­ Millis, K. K., & Just, M. A. (1994). The influence of connectives on sentence comprehension. Journal of Memory and Language, 33(1), https://doi.org/10.1006/jmla.1994.1007. ­ ­128–147. ​­ ­ ­ ­ Murray, J. D. (1997). Connectives and narrative text: The role of continuity. Memory & Cognition, 25(2), ­ https://doi.org/10.3758/BF03201114. ­227–236. ​­ ­ ­ ­ Myers, J. L., Shinjo, M., & Duffy, S. A. (1987). Degree of causal relatedness and memory. Journal of Memory and Language, 26(4), https://doi.org/10.1016/0749-596X(87)90101-X. ­ ­453–465. ​­ ­ ­ ­­ ​­ ­ ­­ ​­ Nieuwland, M. S. (2014). “Who’s he?” Event-related brain potentials and unbound pronouns. Journal of Memory and Language, 76, ­1–28. https://doi.org/10.1016/j.jml.2014.06.002. ​­ ­ ­ ­ Nieuwland, M. S., & van Berkum, J. J. A. (2006). When peanuts fall in love: N400 evidence for the power of discourse. Journal of Cognitive Neuroscience, 18(7), https://doi.org/10.1162/jocn.2006.18.7.1098. ­ ­1098–1111. ​­ ­ ­ ­ Papakonstantinou, M. (2015). Temporal connectives in child language: A study of Greek. PhD Dissertation., University of Thessaloniki. Patterson, C. (2013). Discourse coherence in pronoun resolution. Discours, 12. https://doi.org/10.4000/ ­ ­ discours.8820. ­

136

Experimental studies in discourse Rayner, K. (2012) Eye Movements in Reading: Perceptual and Language Processes. Elsevier Science. Roettger, T. B. (2021). Preregistration in experimental linguistics: Applications, challenges, and limitations. ­ ­1227–1249. ​­ ­ ­ ­­ ­​­­ ​­ Linguistics, 59(5), https://doi.org/10.1515/ling-2019-0048. Rosa, E. C., & Arnold, J. E. (2017). Predictability affects production: Thematic roles can affect reference ​­ ­ ­ ­ form selection. Journal of Memory and Language, 94, ­43–60. https://doi.org/10.1016/j.jml.2016.07.007. Sanders, T., & Pander Maat, H. (2006). Cohesion and coherence: Linguistic approaches’. In Encyclopedia of ­­  ­591–595). ​­ ­ ­ ­­ ­​­­ ­​­­ ​­ ­­ ​­ Language & Linguistics (pp. Elsevier. https://doi.org/10.1016/B0-08-044854-2/00497-1. Sanders, T. J. M., Demberg, V., Hoek, J., Scholman, M. C., Asr, F. T., Zufferey, S., & Evers-Vermeul, J. (2021). Unifying dimensions in coherence relations: How various annotation frameworks are related. Corpus Linguistics and Linguistic Theory, 17(1), ­ ­1–71. ​­ https://doi.org/10.1515/cllt-2016-0078. ­ ­ ­­ ­​­­ ​­ Sanders, T. J. M., & Noordman, L. G. M. (2000). The role of coherence relations and their linguistic markers in text processing. Discourse Processes, 29(1), ­ ­37–60. ​­ https://doi.org/10.1207/S15326950dp2901_3. ­ ­ ­ Sanders, T. J. M., Spooren, W. P. M., & Noordman, L. G. M. (1992). Toward a taxonomy of coherence rela­ ­1–35. ​­ ­ ­ ­ tions. Discourse Processes, 15(1), https://doi.org/10.1080/01638539209544800. Sanders, T. J. M., Spooren, W. P. M., & Noordman, L. G. M. (1993). Coherence relations in a cognitive theory of discourse representation. Cognitive Linguistics, 4(2), ­ ­93–134. ​­ https://doi.org/10.1515/cogl.1993.4.2.93. ­ ­ ­ Santana, A., Spooren, W., Nieuwenhuijsen, D., & Sanders, T. (2018). Subjectivity in Spanish discourse: ­ ­163–191. ​­ Explicit and implicit causal relations in different text types. Dialogue & Discourse, 9(1), https://­ doi.org/10.5087/dad.2018.106. ­ ­ Scholman, M. C., Blything, L., Cain, K., Hoek, J., & Evers-Vermeul, J. (2022). Discourse rules: The effects of ​­ ­ clause order principles on the reading process. Language, Cognition and Neuroscience, ­1–15. https://doi. ­ ­ org/10.1080/23273798.2022.2077971. Scholman, M. C. J., Demberg, V., & Sanders, T. J. M. (2020). Individual differences in expecting coherence relations: Exploring the variability in sensitivity to contextual signals in discourse. Discourse Processes, 57(10), ­ ­844–861. ​­ https://doi.org/10.1080/0163853X.2020.1813492. ­ ­ ­ Scholman, M. C. J., Evers-Vermeul, J., & Sanders, T. J. M. (2016). Categories of coherence relations in discourse annotation, Dialogue & Discourse, 7(2), ­ ­1–28. ​­ https://doi.org/10.5087/dad.2016.201. ­ ­ ­ van Silfhout, G., Evers-Vermeul, J., Mak, W. M., & Sanders, T. J. M. (2014). Connectives and layout as processing signals: How textual features affect students’ processing and text representation. Journal of ­ ­1036–1048. ​­ ­ ­ ­ Educational Psychology, 106(4), https://doi.org/10.1037/a0036293. van Silfhout, G., Evers-Vermeul, J., & Sanders, T. (2015). Connectives as processing signals: How students benefit in processing narrative and expository texts. Discourse Processes, 52(1), ­ ­47–76. ​­ https://doi. ­ org/10.1080/0163853X.2014.905237. ­ ­ Spooren, W., & Degand, L. (2010). Coding coherence relations: Reliability and validity. Corpus Linguistics & Linguistic Theory, 6(2), ­ ­241–266. ​­ https://doi.org/10.1515/cllt.2010.009. ­ ­ ­ Spyridakis, J. H., & Standal, T. C. (1987). Signals in expository prose: Effects on reading comprehension. ­ ­285–298. ​­ ­ ­ ­ Reading Research Quarterly, 22(3), https://doi.org/10.2307/747969. Stevenson, R. J., Crawley, R. A., & Kleinman, D. (1994). Thematic roles, focus and the representation of ­ ­519–548. ​­ ­ ­ ­ events. Language and Cognitive Processes, 9(4), https://doi.org/10.1080/01690969408402130. Stewart, A. J., Pickering, M. J., & Sanford, A. J. (2000). The time course of the influence of implicit causality information: Focusing versus integration accounts. Journal of Memory and Language, 42(3), ­ ­423–443. ​­ https://doi.org/10.1006/jmla.1999.2691. ­ ­ ­ Tourtouri, E. N., Delogu, F., Sikos, L., & Crocker, M. W. (2019). Rational over-specification in visuallysituated comprehension and production. Journal of Cultural Cognitive Science, 3(2), ­ ­175–202. ​­ https://doi. ­ org/10.1007/s41809-019-00032-6. ­ ­­ ­​­­ ­​­­ ​­ Traxler, M. J., Bybee, M. D., & Pickering, M. J. (1997). Influence of connectives on language comprehension: Eye tracking evidence for incremental interpretation. The Quarterly Journal of Experimental Psychology Section A, 50(3), ­ ­481–497. ​­ https://doi.org/10.1080/027249897391982. ­ ­ ­ Tskhovrebova, E., Zufferey, S., & Gygax, P. (2022). Individual variations in the mastery of discourse connectives from teenage years to adulthood. Language Learning, 72(2), ­ ­412–455. ​­ https://doi.org/10.1111/lang. ­ ­ ­ 12481. Vogels, J., Maes, A., & Krahmer, E. (2014). Choosing referring expressions in Belgian and Netherlandic Dutch: Effects of animacy. Lingua, 145, ­104–121. ​­ https://doi.org/10.1016/j.lingua.2014.03.007. ­ ­ ­ Wei, Y., Mak, W. M., Evers-Vermeul, J., & Sanders, T. J. (2019). Causal connectives as indicators of source information: Evidence from the visual world paradigm. Acta Psychologica, 198, 102866. https://doi.org/ 10.1016/j.actpsy.2019.102866. ­

137

Ted J. M. Sanders et al. Wetzel, M., Zufferey, S., & Gygax, P. (2020). Second language acquisition and the mastery of discourse connectives: Assessing the factors that hinder l2-learners from mastering French connectives. Languages, 5(3), ­ 35. https://doi.org/10.3390/languages5030035. ­ ­ ­ Wetzel, M., Zufferey, S., & Gygax, P. (2022). How robust is discourse processing for native readers? The ­ Role of connectives and the coherence relations they convey. Frontiers in Psychology, 13. https://doi. org/10.3389/fpsyg.2022.822151. ­ ­ Xiang, M., & Kuperberg, G. (2015). Reversing expectations during discourse comprehension. Language, Cognition and Neuroscience, 30(6), ­ 648–672. ­ ​­ https://doi.org/10.1080/23273798.2014.995679. ­ ­ ­ Xiao, H., van Hout, R. W., Sanders, T. J., & Spooren, W. P. (2021). A cognitive account of subjectivity put to the test: Using an insertion task to investigate Mandarin result connectives. Cognitive Linguistics, 32(4), ­ ­671–702. ​­ https://doi.org/10.1515/cog-2020-0075. ­ ­ ­­ ­​­­ ​­ Yi, E., & Koenig, J.-P. (2021). Grammar modulates discourse expectations: Evidence from causal relations in English and Korean, Language and Cognition, 13(1), ­ ­99–127. ​­ https://doi.org/10.1017/langcog.2020.29. ­ ­ ­ Zufferey, S., Mak, W., Verbrugge, S., & Sanders, T. (2018). Usage and processing of the French causal connectives “car” and “parce que”. Journal of French Language Studies, 28(1), ­ ­85–112. ​­ https://doi. ­ org/10.1017/S0959269517000084. ­ ­ Zufferey, S., & Gygax, P. (2020). “Roger broke his tooth. however, he went to the dentist”: Why some readers ­ 184–200. ­ ​­ struggle to evaluate wrong (and right) uses of connectives. Discourse Processes, 57(2), https://­ doi.org/10.1080/0163853X.2019.1607446. ­ ­ Zufferey, S., & Gygax, P. M. (2017). Processing connectives with a complex form-function mapping in l2: The ­ ­ ­ case of French “en effet”. Frontiers in Psychology, 8, 1198. https://doi.org/10.3389/fpsyg.2017.01198.

138

9 EXPERIMENTAL STUDIES OF ARGUMENTATION Peter J. Collins and Ulrike Hahn

9.1

Introduction

We argue in our personal lives and professional lives, about consequential matters and inconsequential matters, alone with imagined audiences and together with real audiences. Argument is a ubiquitous phenomenon linked to invaluable social and cognitive skills: good argumentation arguably allows individuals to participate more actively in modern democratic societies, to master academic disciplines and contribute new knowledge, and to learn more deeply (Asterhan & Schwarz, 2016). By studying argument, we gain understanding of how these skills are realised and how they might be improved. Argument is studied with diverse methods in diverse disciplines. Of the diverse methods, this chapter surveys experimental studies of argument, while also situating these in the context of other, wider, empirical approaches. Of the diverse disciplines, this chapter has a psychological focus, since its authors are psychologists, but it also draws on neighbouring disciplines. Arguments occur in various dialogues, including personal conflicts, negotiations, inquiries and deliberations (for discussion, see Walton, 2008). An important type of dialogue is the critical discussion, which aims at resolving a difference of opinion (van Eemeren & Grootendorst, 2004). Critical discussions occupy the attention of much argumentation research. In critical discussions, argumentation can be understood as a verbal and social activity of reason aimed at increasing (or decreasing) the acceptability of a controversial standpoint for a listener or reader, by putting forward a constellation of propositions intended to justify (or refute) the standpoint before a “rational judge”. (van Eemeren et al., 1996, p. 5) This is argumentation as a reasoned debate (as opposed to, say, a quarrel), a definition we follow throughout this chapter. We gain further insight into argument with another, more ancient classification – logic, dialectic, and rhetoric – which can be viewed as three perspectives on argumentation (Tindale, 2004; though see Blair, 2012). From a logical perspective, arguments are intellectual or inferential objects: an argument is a set of statements which are inferentially related; there is at least one premise 139

DOI: 10.4324/9781003392972-11

Peter J. Collins and Ulrike Hahn

and one conclusion (Tindale, 2004). This inferential relationship can vary in strength. It might be maximally strong: say, a valid deductive inference as in classical logic where, roughly speaking, true premises guarantee a true conclusion. The relationship might be relatively weaker – say, an inductive or defeasible inference where true premises may indicate but do not guarantee a true conclusion. When we view argument from the logical perspective, we scrutinise the structure of the statements and the strength of the inferential relationship(s) (Tindale, 2004). A contrast is offered by the dialectical perspective, in which arguments are dialogues in which participants make moves to achieve some persuasive goal (Tindale, 2004). For example, one participant may advance a standpoint; another participant may attempt to rebut it; each participant may attack and defend. When we view argument from the dialectical perspective, we identify what procedures the participants are seeking to implement with their moves and we scrutinise the reasonableness of these procedures (Tindale, 2004). We ask, for instance, whether the procedures foster agreement or enable everyone to speak. Lastly, from a rhetorical perspective, arguments are a social process which connects the particular setting, arguer, and audience (Tindale, 2004). Relevant here are contextual factors such as why the participants (arguer, audience) are in disagreement or dispute; what they have previously committed to; who they are, what they know about each other, and what (potentially competing) interests they have (Tindale, 2004). When we view argument from the rhetorical perspective, we consider how – and how effectively – the arguer adapts their argument to the particular setting (van Eemeren & Houtlosser, 1999; 2000; Tindale, 2004). What does it mean to study argument empirically? This chapter takes ‘empirical’ to mean grounded in systematic observation of the world, a broad definition which includes research methods as diverse as laboratory experiments, argument mining, and discourse analysis. This chapter is focussed on experiments, but we will also seek to demonstrate how and why other empirical methods are required. At base, these experimental (and other quantitative empirical) methods require the researcher to decompose the phenomenon of interest into variables, define those variables concretely for the purposes of the study, and systematically collect data. Data might be collected in correlational studies, which aim to describe existing associations between variables; in controlled laboratory studies, which aim to manipulate a variable and assess the effect of that manipulation; or in intervention studies, which aim to improve argumentation skills in the real world. Between these categorisations fall other, intermediate designs. Much as these quantitative cognitive-scientific methods will be our focus, the chapter will gesture towards other methods that enrich the study of argument.

9.2

Critical issues and topics

This chapter cannot do justice to the innumerable issues and topics that drive the study of argument. In particular, it will not seek to catalogue the extensive literature (both within social psychology and communication studies) that concerns itself with ‘mere’ persuasive success (for a review, see O’Keefe, 2015). In keeping with the emphasis of the philosophical literature on informal argument (e.g., Walton, 2008) and its definition of “critical debate”, it will instead focus on issues critical to a central debate in cognitive sciences: on human rationality. In this debate, key questions are what it means to be rational and how our behaviour compares to rational standards (Stanovich, 2012). For instance, do we fix and update our beliefs rationally, and do we make rational decisions that are well-tailored to achieving our valued goals (Stanovich, 2012)? A further key question is whether our thinking can be made more rational (Stanovich, 2012). Analogues of these questions are woven throughout the experimental literature on argument and guide this chapter: what defines 140

Experimental studies of argumentation

good argument; how good we are at producing and responding to arguments; and if (and how) we can improve our argumentative skills.

9.2.1 What is good argument? There are different sets of norms, or rules, for argument: sometimes these norms compete; sometimes they complement each other. While norms might seem better suited to philosophical study (Elqayam & Evans, 2011), they play a prominent role in empirical research on argument and, indeed, on rationality in general. Norms stimulate research: using norms, researchers can generate and test predictions and, where relevant, develop new lines of research to explain why the predictions are not met (Douven, 2011). Norms are also essential wherever researchers seek to improve people’s behaviour, since norms set the desired endpoint (Elqayam & Evans, 2011). Finally, norms provide organising theories that help structure what would otherwise be a limitless, unrelated, factors that could be studied. Traditionally, norms for argument focussed on argument as an inferential object – above termed informally the logical perspective on argument. These norms were also logical in the more formal sense of classical logic, which provides both a formal language into which arguments can be translated and a set of rules which define permissible ‘valid’ inferences (Shapiro & Kouri Kissel, 2021). If an argument is valid in classical logic, then there is no case in which its premises are all true but its conclusion is false (Shapiro & Kouri Kissel, 2021). For a simple example of (in)validity, consider the following arguments: 1 If Mila is human, she has a heart. Mila is human. Therefore, she has a heart. 2 If Mila is human, she has a heart. Mila is not human. Therefore, she does not have a heart. Example (1) is a valid argument form, modus ponens; it admits no exceptions. Example (2) is an invalid argument form, the formal ‘logical fallacy’ known as denial of the antecedent; it has clear exceptions. Mila may, for example, be a cat with a heart. Where classical logic applies, it has the virtue of enforcing consistency: if the argument is valid and the premises are true, we cannot consistently deny the conclusion. However, classical logic has limited application to real-world arguments, as it requires certainty about the premises and the inference, and certainty is typically lacking in real-world argumentation (Hahn & Oaksford, 2012). While non-classical logics have been offered for argument (e.g., Prakken & Vreeswik, 2002), we set these aside in this chapter as they have so far inspired rather limited empirical-scientific work (though see Stenning & Lambalgen, 2012). Differing markedly from classical logic is a family of theories which take a more dialectical perspective on argument and attempt to define appropriate procedures for argumentative dialogues. A prominent example is the normative system developed by Toulmin (1958), often referred to as the ‘Toulmin model’. The Toulmin model and classical logic differ profoundly in their notions of validity. For classical logic, arguments are valid or invalid absolutely: the argument form is valid, or invalid, across all contexts. For Toulmin, arguments are valid or invalid only relatively: the argument form is valid, or acceptable, for a specific audience in a specific context. Different audiences and different contexts may demand rather different standards, so that, for example, two friends disagreeing on a matter of fact may find an argument valid that scientists in a journal article would wholly reject. Toulmin (1958) progressed from classical logic by replacing the classical-logical model of premises, conclusion, and sets of permissible inferences with the following components. Toulmin’s 141

Peter J. Collins and Ulrike Hahn

arguers make a claim, which they back up with some piece of evidence, the grounds. Arguers link claim and grounds with some general principle, the warrant, which justifies why the grounds count as evidence for such a claim, though arguers often leave the warrant implicit. Arguers can, further, explain why the warrant holds, providing the backing. Arguers can calibrate the argument to their degree of certainty using modal qualifications such as ‘possibly’ and ‘certainly’. Lastly, arguers can anticipate and hedge against possible counterarguments, providing a rebuttal. Each of these components, and the argument as a whole, is accepted or rejected in context by the audience. While this model is better suited to the uncertainties of real-world argument, it provides standards only for what components should be present in an argument, not for judging the quality of those components or, as a consequence, for whether the conclusion is ultimately justified (Hahn et al., 2017). A more detailed dialectical account is provided by pragma-dialectics (e.g., van Eemeren & Grootendorst, 1984; 1992; van Eemeren & Garssen, 2013). Pragma-dialectics analyses the structure of an argumentative discourse and proposes a set of rules for reasonable behaviour within it. Pragma-dialectics distinguishes four stages of an argumentative discourse. Discussants encounter a difference of opinion in the confrontation stage, fix the ground rules for the discussion and state their premises in the opening stage, defend (if the protagonist) or attack (if the antagonist) the standpoints in the argumentation stage, and establish an outcome, whether agreement or nonagreement, in the concluding stage (van Eemeren et al., 2009). Complementing these stages is a set of procedural rules, which draw heavily on the principles that guide everyday conversation (van Eemeren & Garssen, 2013; van Eemeren & Grootendorst, 1984). The first two rules follow, as formulated in van Eemeren et al. (2009, p. 21): • The Freedom Rule: Discussants may not prevent each other from advancing standpoints or from calling standpoints into question. • The Obligation-to-Defend Rule: Discussants who advance a standpoint may not refuse to de­ ­​­­ ​­ fend this standpoint when requested to do so. Such rules are said to gain their normative force by promoting, and removing obstacles to, the resolving of differences of opinion: they have, in pragma-dialectic terms, problem validity (van Eemeren & Grootendorst, 2004; van Eemeren et al., 2009). For the Freedom Rule, pragmadialectics holds that, when discussants do not allow standpoints to be raised or questioned, they conceal the difference of opinion and prevent it from being resolved. For the Obligation-to-Defend Rule, pragma-dialectics holds that, when discussants refuse to defend a standpoint, they prevent the discussion from proceeding to the argumentation or concluding stages (van Eemeren et al., 2009). While we might consider it an empirical question whether each procedural rule is problemvalid, pragma-dialectics takes it to be an ‘analytic-theoretical question’ (van Eemeren et al., 2009, p. 27), not resolvable by collecting data. Pragma-dialectics incorporates a more rhetorical perspective on argument through its conception of strategic manoeuvring (van Eemeren & Houtlosser, 1999, 2007). In studies of strategic manoeuvring, pragma-dialecticians explore how arguers can strengthen their argument by appealing to the particular audience: for example, arguers might select a starting point as close as possible to their audience’s standpoint to minimise the ‘disagreement space’ and the amount of argumentative work to be done, or they might select the most appealing lines of argument and defence (van Eemeren & Houtlosser, 1999). Such strategic manoeuvring is said to be most effective when arguers use a combined strategy of selecting topics, creating ‘empathy or “communion” ’ with their audience (van Eemeren & Houtlosser, 2000, p. 298), and drawing on presentational devices, such 142

Experimental studies of argumentation

as figures of speech, to reinforce their case (van Eemeren & Houtlosser, 2000; Tindale, 2004). But, according to pragma-dialectics, arguers should balance effectiveness against reasonableness: while they manoeuvre strategically, they should not violate any procedural rules (van Eemeren & Houtlosser, 2000; Tindale, 2004). Pragma-dialectics has proved an influential theory of argument in large part through its identification of procedural rules (Tindale, 2004), but any comprehensive theory must consider not just procedure but also content. This point is explicit in pragma-dialectical rules, since two of the rules require that ‘reasoning … presented as formally conclusive may not be invalid in a logical sense’ (the Validity Rule) and that defences of standpoints must take place by means of ‘appropriate argument schemes [roughly: forms of everyday argument] that are applied correctly’ (the Argument Scheme Rule) (e.g., van Eemeren et al., 2009, p. 23). Needed, then, are accounts of valid reasoning and appropriate argument schemes. A large literature has been concerned with identifying argument schemes: forms of everyday, defeasible argument. Argument schemes have both descriptive and normative uses. Descriptively, they provide a means of classifying existing forms of real-world arguments, both for the intrinsic value of the project and for applied use in disciplines such as artificial intelligence (Paglieri, 2021). Normatively, they provide guidelines for when these real-world argument forms can be considered strong (Paglieri, 2021). Argument schemes comprise a set of argumentative moves, which can be reconstructed as premises and conclusions akin to those in formal logic, and a set of critical questions which evaluate the strength of the argument. Argument schemes feature in different inventories which vary considerably in their membership (van Eemeren, 2018; van Eemeren and Grootendorst, 2004; Garssen, 1997; Hastings, 1962; Kienpointner, 1992; Perelman & OlbrechtsTyteca, 1969), with Walton et al.’s (2008) 60 schemes being amongst the most extensive. For a simple instance, take the scheme and accompanying critical questions for the Argument from Sign as presented by Walton et al. (2008, p. 329): A (a finding) is true in this situation. B is generally indicated as being true in this situation when its sign, A, is true. B is true in this situation. Critical Question 1: What is the strength of the correlation of the sign with the event signified? Critical Question 2: Are there any other events that would more reliably account for the sign? To adapt an example from Hahn and Hornikx (2016), we might, in a particular situation, observe a large number of people in the street with cameras, and hold that, generally, where there is a large number of people in the street with cameras, that location is a tourist site. From this we might conclude that the particular situation is a tourist site. In evaluating this argument, an audience might, however, query the strength of the correlation or wonder whether another event could account for the sign: for instance, perhaps an important dignitary is present, making the situation worth photographing. Argument schemes can be left somewhat informal, as above, or formalised with defeasible logic (e.g., Walton, 2006) or graphical representations (Gordon et al., 2007). Argument schemes have proved influential in a range of disciplines and play an important role in supplementing more procedural accounts, but they face challenges. Key questions are: how many schemes should there be, and are proposed schemes meaningfully distinct (Hahn & Hornikx, 2016)? If the schemes are to be effective norms, then their critical questions must also be well justified, but while proposed questions are often intuitively appealing, there is nothing to 143

Peter J. Collins and Ulrike Hahn

say that they exhaust the relevant issues (Walton et al., 2008), and they remain somewhat loosely connected with the schemes (Hahn & Hornikx, 2016). Overall, it is unclear what normative force argument schemes have. We have so far reviewed normative systems which respond to the limitations of classical logic by focusing on dialectical aspects of argument. We turn to a final normative system which takes a more thoroughly logical perspective in the sense of treating argument as an inferential object. Earlier we identified a key obstacle to applying classical-logical norms to real-world argument: uncertainty; arguers may be less than fully confident in the truth of their premises and the inferential relationship between premises and conclusion may involve less than deductive certainty, i.e., ampliative or inductive inference. Such uncertainty is problematic for classical logic but is naturally modelled with probability. There are probabilistic – and, in particular, Bayesian – accounts of argument which represent arguers’ degree of belief as a subjective probability and use a fundamental rule of probability theory, Bayes’ Rule, to prescribe how arguers should reason on the basis of their beliefs (Eva & Hartmann, 2018; Hahn & Oaksford, 2007; Korb, 2004; Zenker, 2012). Bayes’ rule provides a general prescription for how an agent should update their beliefs in light of evidence. To illustrate how Bayes’ rule can be applied to argument, we introduce a formalism using ‘C’ to stand for a conclusion and ‘E’ to stand for evidence. Let us assume that we are modelling a simple argument comprising one conclusion supported by one premise, or piece of evidence. A Bayesian agent estimates their degree of belief in light of the evidence: the posterior probability, P E . That belief is proportionate to the agent’s prior belief in the conclusion, P C , and the likelihood, P C , the probability that the evidence would have occurred if the claim were true. The likelihood can be understood more intuitively as the agent’s judgement of how consistent the evidence is with the conclusion Expressed mathematically:

( )

( )

( )

( )

( )

( )

P E ∝P C ×P C

To calculate a probability, we must ‘normalise’, or divide, by the probability of the evidence. Hence:

( )

P E =

( )

( )

P C × P C

( )

P E

It is helpful when analysing argument – and, indeed, can be helpful more generally – to unpack the denominator, P E , further and consider how the evidence can be realised. The evidence might be realised in two ways: when the conclusion is true, and when the conclusion is false. The probabilities of these two events are shown within the square brackets; note that ‘~ C’ means that the claim is not true.

( )

( )

P E =

( ) ( ) P (C ) × P (C ) + P ( ~ C ) × P ( ~ C ) P C × P C

Two terms from this theorem can be used to measure the strength, or diagnosticity, of the evidence: how much our beliefs should shift in light of the evidence. These terms are P( E | C), which is the probability that the evidence would occur if the claim were true, and P( E |~ C ), which is the probability that the evidence would occur if the claim were false. The ratio of these terms is known as the likelihood ratio. Likelihood Ratio =

P( E | C) P( E |~ C ) 144

Experimental studies of argumentation

This ratio provides a natural measure of the strength of an argument. This is a rather simple model of an argument, but the model can be extended to complex arguments. For instance, we can add parameters to accommodate multiple pieces of evidence. Indeed, a complex Bayesian model can capture all the evidence presented in a court case (Kadane & Schum, 1996). Through additional model parameters, Bayesian models can apply to arguments about valued outcomes, such as arguments from consequents: arguments of the form ‘We should do X because of happy consequence Y’ or ‘We should not do X because of unhappy consequence Z’. The additional parameters in this case represent the subjective value, or utility, of particular outcomes, connecting arguments with the norms of subjective expected utility theory (for an illustration, see Corner et al., 2011). A Bayesian account of argument resembles logic and differs from dialectical accounts, in providing mathematically justified norms for belief. Bayesian norms subsume classical-logical norms pertaining to propositional logic: these fall out from probabilities being set to one (for discussion, see Hahn & Hornikx, 2016). Moreover, Bayesian reasoning can be shown to be optimal in given conditions (Hahn, 2014; Rosenkrantz, 1992). Probabilistic norms provide a rather deeper grounding than is available to pragma-dialectics and argumentation schemes, but they have been resisted by theorists from the latter schools (e.g., Walton et al., 2008). In the study of argument, the diverse norms just reviewed not only embody different perspectives on argument, they also continue to motivate experimental work. This raises questions about their ultimate relationship that is yet to be resolved. Since argument is a complex social and intellectual phenomenon, it may well require different norms to address different aspects of argumentative behaviour. What seem, at first blush, to be separate norms may also be given a combined treatment, as, for example, in Bayesian interpretations of argumentation schemes (Hahn & Hornikx, 2016).

9.2.2

How good are we at argument?

Much research has explored the argumentation skills of laypeople – that is, people not specifically trained in argumentation – and sought to establish whether laypeople follow norms such as those outlined above. This question has been pursued in at least two general ways: the first is to ask which norms laypeople invoke in their conscious beliefs about argument; the second is to ask which norms laypeople use to generate or evaluate arguments. It has proved particularly important to pragma-dialectics to explore which norms feature in laypeople’s conscious beliefs about argument. Pragma-dialecticians identify two types of validity for rules of argument: problem validity, in that rules should help to resolve disputes; and conventional validity, in that rules should ‘be acceptable to the discussants’ (e.g., van Eemeren et al., 2009, p. 27). Researchers beyond pragma-dialectics have sometimes used the term ‘conventional validity’ to indicate that research participants actually use these rules in producing or evaluating arguments (Schellens et al., 2017). However, pragma-dialecticians distinguish conventional validity from other factors including the relevance or persuasiveness of the arguments (van Eemeren et al., 2009), factors which would indicate that the norms are applied. Pragma-dialectics focus, then, on what people believe about argument and how those beliefs fit with proposed pragma­dialectic norms. A rather larger literature has investigated the norms that people actually seem to use in argument (consciously or not). This research question can be decomposed into numerous sub-questions, an initial distinction being between producing and evaluating arguments. Argument production features in a wide range of disciplines including cognitive science, argumentation theory, and 145

Peter J. Collins and Ulrike Hahn

discursive psychology. Cognitive scientists have considered both logical and dialectical perspectives on argument: for instance, in a classic study, Kuhn (1991) sought to establish how well people produce and evaluate arguments, with research participants ranging from teenagers to adults in their sixties. Kuhn developed a set of criteria for both the kinds of moves her participants made (the dialectical perspective) and the quality of the content (the logical perspective). Cognitive scientists have also produced relevant data in the control groups of intervention studies (e.g., Crowell & Kuhn, 2014; Kuhn & Crowell, 2011; Zavala & Kuhn, 2017). Argumentation theorists have applied both pragma-dialectic and Bayesian analyses to extended real-world arguments (van Eemeren & Houtlosser, 1999; Kadane & Schum, 1996) and used real-world arguments to identify argument schemes (Paglieri, 2021). Discursive psychologists have applied discourse-analytic methods to naturally occurring data and dialectical and rhetorical strategies, for instance, in the language of cold calls (Humă et al., 2020), though focusing on effectiveness rather than logical or dialectical norms. Another sizeable literature treats argument evaluation. Much of this research expressly asks only which factors are persuasive, but findings can provide relevant evidence about whether these are factors that should be persuasive. A common research method is to present persuasive messages in favour of some behaviour and vary the messages so that they appeal to different types of consequences. A message might advocate an after-school programme, for example, by arguing that it will promote success or that it will avoid failure (Cesario et al., 2004). Researchers frequently explore the fit between such message variations and dispositions of research participants: for example, whether the research participants tend to value achieving success or avoiding failure. However, findings from such studies have offered valuable data for argumentation research where they include parameters from norms such as the subjective value of the outcome and its probability ­ (O’Keefe, 2013). A valuable line of research has addressed norms more directly through studying fallacies: arguments that may seem persuasive but are invalid (Hamblin, 1970). Fallacies have long been a focus of argumentation theory, resulting in lengthy catalogues of questionable argument forms. For example: • Slippery slope arguments: Congress should not mandate background checks on weapons. If it does, it will soon have to ban all weapons outright. • Arguments from ignorance: Doctors should prescribe Ivermectin because no one has proved it isn’t an effective treatment for Covid. • Ad hominem arguments: We should disregard the leader of the opposition’s criticism of the prime minister attending parties during lockdown because the leader of the opposition is alleged to have attended a party too. Arguments of these forms are not always invalid (Hamblin, 1970). Indeed, circular arguments are logically valid (Hahn, 2011). But more importantly, whether or not they are logically valid, logical fallacies may, on occasion, be entirely reasonable. Take slippery slope arguments, which gain their force from an action with an attractive outcome leading to actions with unattractive outcomes. It may often, but need not, be the case that the link between the initial and subsequent actions is tenuous (Corner et al., 2011). We have introduced normative theories of argument which can separate genuine fallacies from acceptable arguments, and which can predict individuals’ response to fallacies. Fallacies offer opportunities to test individuals’ sensitivity to different norms. Take as an example ad hominem arguments, which argue against a conclusion by attacking its proponent. Pragma-dialecticians 146

Experimental studies of argumentation

define fallacies in terms of procedure. They view an ad hominem argument as fallacious when it occurs in the confrontation stage of a discussion and violate the Freedom Rule: that is, the argument prevents a standpoint being advanced or called into question (van Eemeren et al., 2009). Pragma-dialecticians predict that research participants will find ad hominem arguments unreasonable in the confrontation stage (van Eemeren et al., 2009). While pragma-dialecticians distinguish participants’ sense of reasonableness from participants’ acceptance of an argument, they nevertheless expect some connection between reasonableness and acceptance so, all else equal, we might expect individuals to find ad hominem arguments less persuasive in the confrontation stage. ­ ​­ Pragma-dialecticians consider ad hominem arguments legitimate when the arguments occur later in the discussion, in the argumentation stage. Other theorists treat ad hominem arguments as part of a more general account of source-based argumentation. Much work adopts a Bayesian framework (e.g., Bhatia & Oaksford, 2015; Hahn et al., 2009; Harris et al., 2012; Harris et al., 2016), but there are substantive links with other, putatively reasonable, argumentation schemes such as arguments from authority or expertise (Hahn & Hornikx, 2016; Harris et al., 2016). Such work emphasises the general relevance of an argument’s source. In real-world arguments, arguers make claims about facts and inferential relations. For an argument to be convincing, the arguer must seem sufficiently reliable as a source, unless the audience knows enough about the topic to evaluate claimed facts and inferential relations or can independently verify them as part of the discussion (Collins & Hahn, 2016; Hahn et al., 2012). Source reliability features in both argument schemes and Bayesian models, with common elements being the credibility of the source as an expert in a relevant field and their trustworthiness, in the sense of their tendency to report their true belief (Harris et al., 2016; Schellens et al., 2017; Walton et al., 2008). Both expertise and trustworthiness are factors typically targeted by ad hominem arguments. Such accounts predict that individuals will refer to specific types of content and that individuals’ evaluations will be graded, following probabilistic predictions. So far, we have implicitly presented argument production and evaluation at a group level, considering individuals’ skills on average. But how do individuals differ in their abilities: what factors predict whether an individual complies with norms when producing or evaluating arguments? Relevant, here, is a wide range of factors; the following are prominent examples. One key factor is how individuals perform with arguments across the life course, with researchers tracking levels of skills from childhood through to older adulthood (e.g., Crowell & Kuhn, 2014; Kuhn, 1991; Kuhn & Modrek, 2018). Other factors include individuals’ educational level (Kuhn, 1991; Kuhn & Modrek, 2018), their epistemological beliefs (Kuhn et al., 2000, 2010), and their achievement goals, such as whether they are motivated by learning and progress (mastery goals), demonstrating ability (performance achievement), or avoiding demonstrating inability (performance avoidance) (Asterhan & Schwarz, 2016). When exploring such individual differences, researchers have tended to focus on dialectical, or procedural, aspects of argument.

9.2.3

Can we improve our argumentation skills?

Good argumentation has been recognised as essential to functioning democracies, as fundamental to mastery of academic disciplines, and as invaluable for deeper learning (Asterhan & Schwarz, 2016). Improving argument, then, has been a key goal of argumentation research. Argumentation researchers have sought to close the gap between how individuals should produce and evaluate argumentations and how they in fact do so. Such intervention research has tended to draw on procedural rules, seeking to increase the frequency of argumentative moves, such as making and justifying claims, responding to others’ arguments, and adjudicating between competing arguments 147

Peter J. Collins and Ulrike Hahn

(Crowell & Kuhn, 2014; Kuhn & Crowell, 2011). Methods have included typically lengthy classroom interventions with children (e.g., Crowell and Kuhn, 2014; Hemberger et al., 2017; Kuhn & Crowell, 2011; Shi et al., 2019) and typically rather briefer, for instance, single-session, studies with adults (e.g., Kuhn & Modrek, 2021; Zavala & Kuhn, 2017).

9.3

Current contributions and research 9.3.1

Conventional validity

What norms do laypeople endorse as acceptable? This question, as we have seen, has proved important to pragma-dialecticians, who have produced suggestive evidence for their proposed norms in extensive experiments (van Eemeren et al., 2009). These experiments presented brief arguments comprising argumentative moves which either respected or violated pragma-dialectic rules. Experimental materials informed participants that ‘people may hold different ideas concerning the question of what is or is not permissible in a discussion, what is or is not reasonable’ and instructed the participants to rate the ‘reasonableness’ of particular moves (van Eemeren et al., 2009, p. 66). Generally, participants rated as more reasonable arguments which respected pragmadialectic rules (van Eemeren et al., 2009, see 2012 for evidence on strategic manoeuvring). While pragma-dialecticians take such findings as strong evidence of conventional validity (van Eemeren et al., 2009), the evidence is not clear cut. Its interpretation rests on the assumption that participants understood the reasonableness scale as the researchers intended, yet participants did not receive training in interpreting the scale and do not appear to have been asked how they interpreted it. Judgements of reasonableness may well have collapsed into judgements of persuasiveness or agreement.

9.3.2 Argument production Current research presents a mixed picture of individuals’ skills in producing arguments. Weaknesses are apparent in research on dialectical norms – that is, whether individuals follow reasonable procedures – and some research on content. Individuals fall below expectations in enacting critical procedures such as making and supporting claims, anticipating and responding to others’ arguments, and integrating multiple, competing arguments into a cohesive whole (Asterhan & Schwarz, 2016; Kuhn, 1991; Kuhn & Modrek, 2021; 2022). Some research investigates content more deeply and finds that individuals are apt to confuse explanation and evidence: that is, they tend to explain how some cause might plausibly produce an effect but neglect to provide evidence that it actually does (Brem & Rips, 2000; Kuhn, 1991; Kuhn & Modrek, 2018). Individuals also tend to rely on simplistic causal models which feature a single cause for an event, whereas real-world argument often concerns complex, multi-causal phenomena (Kuhn & Modrek, 2018). Strengths are more apparent in research on argument schemes, in which individuals drew on key parameters from argument schemes when generating and defending arguments from authority, arguments from example, arguments from analogy, arguments from cause to effect, and arguments from consequences (Schellens et al., 2017).

9.3.3 Argument evaluation Research on argument evaluation paints a more positive picture of typical argumentation skills. Such research has drawn on a broader set of norms, considering pragma-dialectics, argument 148

Experimental studies of argumentation

schemes, and Bayesian norms. Above we saw that, when individuals assess the reasonableness of arguments, they are sensitive to pragma-dialectic norms. These findings suggest that individuals refer to pragma-dialectic norms when judging whether to accept an argument. Clearer evidence is needed, however, to tease apart reasonableness and actual persuasion. A broad range of studies suggests that individuals are sensitive to key parameters from both argument schemes and Bayesian models. For arguments from analogy, participants have shown sensitivity to normative criteria including whether similarities between cases were relevant or irrelevant and whether there were relevant dissimilarities (Hoeken, Timmers and Schellens, 2012; Schellens et al., 2017). For arguments from consequences, participants have shown sensitivity to the normative criterion of the desirability of consequences (Corner, Hahn and Oaksford, 2011; Hoeken, Timmers and Schellens, 2012; Schellens et al., 2017; see also O’Keefe, 2013 for a review), though participants have been less consistently sensitive to the normative criterion of the probability of the consequences, with some favourable evidence (Corner, Hahn and Oaksford, 2011) and some unfavourable (Hoeken, Timmers and Schellens, 2012). For arguments from authority/expert opinion, participants have shown sensitivity to normative criteria including the authority’s expertise, trustworthiness, consistency with other authorities, the match between the authority’s opinion and the arguer’s claim, and the recency of the authority’s opinion (Harris et al., 2016; Hoeken, Timmers and Schellens, 2012; Schellens et al., 2017). As we saw above, research on informal fallacies allows researchers to test and compare participants’ sensitivity to norms. Such research supports complementary sets of norms. On the one hand, participants are sensitive to pragma-dialectic norms which regulate procedures; on the other hand, participants are sensitive to norms of content. Take, for example, ad hominem arguments. As pragma-dialectics predicts, participants find ad hominem arguments less reasonable than controls in the confrontation stage of an argument, where they violate the Freedom Rule (van Eemeren et al., 2009). However, pragma-dialectics does not account well for findings that participants also distinguish between ad hominem arguments and controls when the stage of the argument is not fixed and that participants show graded responses to ad hominem arguments wherever they occur (Bhatia and Oaksford, 2015). Such findings point to the need for more content-based norms and, indeed, participants’ performance is well explained by a Bayesian model of ad hominem arguments (Bhatia and Oaksford, 2015). Consider, as a second example, arguments from ignorance. As pragma-dialectics predicts, participants rate as less reasonable than controls attempts to prematurely close an argument with either of the following argumentative moves: since the arguer has not been proven that a claim is true, the claim is false; or, since the arguer has not proven that a claim is false, the claim is true (van Eemeren et al., 2009, p. 193). However, participants are also sensitive to Bayesian norms, for example, including prior belief in the conclusion and the amount of evidence considered (Hahn & Oaksford, 2007).

9.3.4

Individual differences

Individuals vary considerably in their argumentative skills, and much effort has been spent identifying characteristics of individuals that can account for variation in performance. One obvious candidate is age, since individuals acquire a wide range of cognitive abilities as they progress through childhood into adulthood. Adolescents, for example, tend to focus on conveying and supporting their own position and are less likely than adults to argue against their opponents’ position (Felton & Kuhn, 2001). However, age accounts for less variation in skill than we might expect. Adolescents and adults make some errors at a similar rate, including failing to distinguish explanation and evidence and relying on single-cause thinking (Kuhn, 1991; Kuhn & Modrek, 2018), 149

Peter J. Collins and Ulrike Hahn

a finding which has been taken to suggest that some argumentative skills develop before adolescence (Kuhn, 1991). What other individual-difference variables might, then, account for variation in skill? An important variable is level of education (Kuhn, 1991; Kuhn & Modrek, 2018): for example, when participants were asked to select evidence to dispute a causal claim, 94% of high-school graduates failed to select relevant evidence, falling to 83% of college graduates, and falling further to 46% of those with postgraduate degrees. Another important variable is individuals’ epistemological beliefs. Individuals hold markedly different views on the nature of belief and knowledge. For example, multiplists treat assertions as opinions which are not amenable to being critiqued, and engage little in argument; evaluativists, in contrast, treat assertions as claims to be judged against evidence, and engage rather more in argument (Kuhn et al., 2000; Kuhn et al., 2010). Important, too, is motivation, which predicts both style and extent of argumentation (Asterhan & Schwarz, 2016): mastery goals predict engagement in deliberative, collaborative argument; performanceapproach goals predict more combative, disputative argument, characterised by a focus on winning; performance-avoidance goals predict consensus seeking, characterised by avoidance of conflict and deferring to others’ views (Asterhan et al., 2009).

9.3.5

Improving argument

Intervention studies have successfully improved individuals’ argumentative skills. Such studies have predominantly taken a dialectical perspective and worked on individuals’ adherence to reasonable procedures. For example, an intensive long-term intervention with US high-school children improved skills through supported peer discourse, with phases preparing students for appropriate argumentative moves, staging both small-group and whole-class debates, and measuring essay writing and more standardised assessments (Crowell & Kuhn, 2014; Kuhn & Crowell, 2011). Students in the intervention groups surpassed controls in a range of skills, such as acknowledging multiple perspectives, integrating these perspectives into a cohesive argument, providing evidence, and anticipating and responding to counterexamples. Interventions have been successful where participants engaged in purely electronic peer discourse (Iordanou & Kuhn, 2020), where participants created an imaginary peer discourse (Zavala & Kuhn, 2017), and where participants received arguments presented as a dialogue rather than a text (Kuhn & Modrek, 2021). Interventions have been successful with children (Crowell & Kuhn, 2014; Iordanou & Kuhn, 2020; Kuhn & Crowell, 2011) and adults (Kuhn & Modrek, 2021; Zavala & Kuhn, 2017).

9.4

Main research methods

Researchers on argumentation have used a wide variety of methods. This section focuses on those central to the psychology of argument – experimentation, intervention, and correlational studies – before briefly surveying common methods in neighbouring disciplines. Much research on argumentation pursues causal questions: for example, what factors determine the persuasiveness of an argument, and what teaching methods improve people’s argumentation skills? For such questions, the researcher canonically manipulates some variable(s) called the independent variable(s) and measures some important quantity called the dependent variable in search of an effect. This is the logic of both intervention studies and experiments, a logic which rests on concrete and well-justified definitions of all variables. In intervention studies, the independent variable will typically comprise at least a treatment group and a control group, to which participants are randomly assigned. The treatment group is exposed to training in argumentation; 150

Experimental studies of argumentation

the control group may receive nothing or some informational content which lacks the ‘active ingredient’ of the training. While a wide range of dependent variables is possible, studies have tended to judge participants’ skills by having them produce arguments and counting the number of appropriate argumentative moves participants make. A dependent variable of this kind requires researchers to interpret participants’ utterances and categorise, or code, them against a pre-existing scheme. In experiments, the independent variable(s) commonly manipulates key argument parameters derived from theories of argument. For instance, experiments have tested whether participants follow Bayesian norms for arguments from ignorance by manipulating prior belief in the conclusion and the amount of evidence (Hahn & Oaksford, 2007). Participants are randomly allocated to specific conditions (a between-participants design) or are exposed to all conditions (a withinparticipants design), ideally in a random order. Cognitive-scientific experiments have used a wide range of dependent variables, including ratings scales, eye movements, and measures of neural activity such as EEG or fMRI. However, argumentation research has tended to use ad hoc ­Likert-​ style scales, such as ratings of reasonableness or convincingness on 7-point or 11-point scales. It is central to both interventions and experiments to eliminate plausible alternative explanations by holding them constant during the study or by measuring them and taking them into account in the statistical analysis. Argumentation research has also pursued more descriptive questions; such studies do not manipulate variables but rather measure naturally occurring relationships among variables. Examples include studies of individual differences in argument. While researchers could in principle intervene on some individual-difference variables, such as by training people to adopt different motivations or epistemological beliefs, research on individual differences has tended not to intervene (Asterhan & Schwarz, 2016). Where descriptive studies are quantitative, they require concrete and well-justified definitions of all variables as much as intervention and experimental studies. However, descriptive studies may be more qualitative, using interviews, for example, to explore participants’ responses to particular arguments and the factors participants’ explicitly consider in evaluating them (Schellens et al., 2017). The methods above involve presenting participants with arguments or challenging them to produce arguments on specific topics. An alternative method has proved popular in neighbouring disciplines in which researchers sample and analyse naturally occurring arguments. Arguments might be sampled somewhat informally, as was traditionally the case in argumentation research, with prominent or theoretically important arguments selected for analysis. Arguments might instead be sampled more systematically to create large corpora of argumentative discourse. Existing corpora draw on a variety of materials, ranging from tweets (Bosc et al., 2016) to longer arguments such as discussions on internet forums (Walker et al., 2012) and newspaper articles (Kiesel et al., 2015). Corpora permit both labour-intensive manual analyses and automated argument mining and can thereby reveal patterns in argumentation (Lawrence & Reed, 2020). Such analyses offer a range of practical and commercial advantages, extending more established methods such as sentiment and opinion analysis by offering insight not only into what is felt or believed but also why – or at least what reasons are offered (Lawrence & Reed, 2020). Corpora are also a rich resource for discourse analysts, whose work overlaps substantially with argumentation theory (McEnery & Baker, 2015).

9.5

Recommendation for practice and future directions

It may seem trivial to note that experiments should ideally be shaped by theory, but the importance of the theoretical frameworks outlined in this chapter for any empirical study of argument cannot be 151

Peter J. Collins and Ulrike Hahn

overstated. Without such a framework, ‘an argument’ is simply a unique utterance with a particular content, context, speaker(s), and listener(s), at a specific point in time. To take an analogy from other parts of linguistics, seeking to study argument without them is not just like trying to do psycholinguistics without one of numerous rival theories of processing, but rather like trying to do psycholinguistics without basic syntactic categories like noun or verb. The theoretical frameworks described determine relevant types over which empirical generalisations and regularities might be found. However, the relationships between these frameworks and the aspects of argument they draw attention to are presently not sufficiently clearly understood. Theoretical work further elucidating these different aspects and how they relate is, thus, required. However, there is arguably also a need for further empirical work to help ascertain whether current frameworks are, collectively, explanatorily adequate, let alone whether those aspects presently identified are, in fact, complete. Again, unlike syntax, there is not a (near) universal core competence for “argument” that we all intrinsically possess. Rather, as the work on individual differences discussed above shows, people vary widely in what arguments they produce, perceive, and or how they evaluate those arguments. This means researchers have even less of an intuitive grasp of the range of what is produced than for other aspects of language. Large-scale observational data, in particular corpus studies consequently seem even more pressing than in other areas, to determine whether there are important aspects of argument that have been underrepresented or even missed. At the same time, future work should seek to progress experimental methods in analogy to other areas. Experiments on argument have tended to rely on rating scales that make use of intuitive, everyday concepts such as reasonableness or convincingness but that are ad hoc, that is, developed for the purposes of the study without prior validation. Such scales offer rather coarsegrained evidence on argument evaluation; finer-grained evidence might be achieved in two ways. First, researchers may supplement ratings data with evidence on how participants interpret and use the scales, for instance, through qualitative questioning (Uher, 2018). Second, researchers may use a broader range of dependent variables, drawing, perhaps, on related research on reasoning, where measures have included reading times (Haigh et al., 2012), eye movements (Haigh et al., 2014), EEG components (Bonnefond et al., 2012), and fMRI measures (Prado et al., 2020). This second course would enable a more fundamental move towards studying process. Strengthening argumentation research in these two separate directions of breadth and depth should help foster a mature science of argumentation.

Further reading van Eemeren, F. H. (2018). Argumentation Theory: A ­Pragma-Dialectical ​­ Perspective. Springer. Hahn, U. & Oaksford, M. (2007). The rationality of informal argumentation: A Bayesian approach to, reasoning fallacies, Psychological Review, 114(3), ­ ­704–732. ​­ doi:10.1037/0033–295X.114.3.704. ­­ ​­ Walton, D. N., Reed, C., & Macagno, F. (2008). Argumentation Schemes. Cambridge University Press.

Related topics Experimental semantics; experimental pragmatics; new directions in statistical analysis for experimental linguistics.

References Asterhan, C. S. C., & Schwarz, B. B. (2016). Argumentation for learning: Well-trodden paths and unexplored territories. Educational Psychologist, 51(2), doi:10.1080/00461520.2016.1155458. ­ ­164–187. ​­ ­

152

Experimental studies of argumentation Asterhan, C. S. C., Schwarz, B. B., & Butler, R. (2009). Inhibitors and facilitators of peer interaction that supports conceptual learning: The role of achievement goal orientations. In N. A. Taatgen & H. van Rijn (Eds.), ­ Proceedings of 31st Annual Conference of the Cognitive Science Society. Erlbaum, ­1633–1638. ​­ Bhatia, J.-S., & Oaksford, M. (2015). Discounting testimony with the argument ad hominem and a Bayesian ­ congruent prior model. Journal of Experimental Psychology: Learning, Memory, and Cognition, 41(5), ­1548–1559. ​­ doi:10.1037/xlm0000151. ­ Blair, J. A. (2012). Rhetoric, dialectic, and logic as related to argument. Philosophy & Rhetoric, 45(2), ­ ­148–​ ­164. doi:10.5325/philrhet.45.2.0148. ­ Bonnefond, M., Van der Henst, J. B., Gougain, M., Robic, S., Olsen, M. D., Weiss, O., & Noveck, I. (2012). How pragmatic interpretations arise from conditionals: Profiling the affirmation of the consequent argument with reaction time and EEG measures. Journal of Memory and Language, 67(4), ­ ­468–485. ​­ doi:10.1016/j.jml.2012.07.007. ­ Bosc, T., Cabrio, E., & Villata, S. (2016). DART: A dataset of arguments and their relations on twitter. In Proceedings of the 10th Edition of the Language Resources and Evaluation Conference, p. 1258. Available at: https://hal.inria.fr/hal-01332336 ­ ­­ ​­ (Accessed: ­ 28 June 2022). Brem, S. K., & Rips, L. J. (2000). Explanation and evidence in informal argument. Cognitive Science, 24(4), ­ ­573–604. ​­ ­ doi:10.1207/s15516709cog2404_2. Cesario, J., Grant, H., & Higgins, E. T. (2004). Regulatory fit and persuasion: Transfer from “feeling right.”. Journal of Personality and Social Psychology, 86(3), ­ ­388–404. ​­ doi:10.1037/0022–3514.86.3.388. ­­ ​­ Collins, P. J., & Hahn, U. (2016). Arguments and their sources. In F. Paglieri, L. Bonellli, & S. Felletti (Eds.), ­­  ­129–129). ​­ The Psychology of Argument: Cognitive Approaches to Argumentation and Persuasion (pp. College Publications. Corner, A., Hahn, U., & Oaksford, M. (2011). The psychological mechanism of the slippery slope argument. Journal of Memory and Language, 64(2), ­ ­133–152. ​­ doi:10.1016/j.jml.2010.10.002. ­ Crowell, A., & Kuhn, D. (2014). Developing dialogic argumentation skills: A 3-year intervention study. Jour­ ­363–381. ​­ ­ nal of Cognition and Development, 15(2), doi:10.1080/15248372.2012.725187. Douven, I. (2011). A role for normativism. Behavioral and Brain Sciences, 34(05), ­ ­252–253. ​­ doi:10.1017/S0140525X11000471. ­ van Eemeren, F. H. (2018). Argumentation Theory: A ­Pragma-Dialectical ​­ Perspective. Springer. van Eemeren, F. H., & Garssen, B. (2013). Viewing the study of argumentation as normative pragmatics. In A. Capone, F. L. Piparo & M. Carapezza (Eds.), Perspectives on Pragmatics and Philosophy (pp. ­­  ­515–536). ​­ Springer. van Eemeren, F. H., Garssen, B., & Meuffels, B. (2009). Fallacies and Judgments of Reasonableness: Empirical Research Concerning the Pragma-Dialectical Discussion Rules. Springer. van Eemeren, F. H., Garssen, B., & Meuffels, B. (2012). The disguised abusive ad hominem empirically investigated: Strategic manoeuvring with direct personal attacks. Thinking & Reasoning, 18(3), ­ ­344–364. ​­ doi:10.1080/13546783.2012.678666. ­ van Eemeren, F. H., & Grootendorst, R. (1984). Speech acts in argumentative discussions: A theoretical model for the analysis of discussions directed toward solving conflicts of opinion. Floris Press. van Eemeren, F. H., & Grootendorst, R. (1992). Argumentation, Communication, and Fallacies: A Pragma­Dialectical Perspective. Lawrence Erlbaum Associates. van Eemeren, F. H., & Grootendorst, R. (2004). A systematic Theory of Argumentation: The Pragma­Dialectical Approach. Cambridge University Press. van Eemeren, F. H., Grootendorst, R., & Snoeck Henkemans, F. (1996). Fundamentals of Argumentation Theory. Erlbaum. van Eemeren, F. H., & Houtlosser, P. (1999). Strategic manoeuvring in argumentative discourse. Discourse Studies, 1(4), ­ ­479–497. ​­ doi:10.1177/1461445699001004005. ­ van Eemeren, F. H., & Houtlosser, P. (2000). Rhetorical analysis within a pragma-dialectical framework. Argumentation, 14(3), ­ ­293–305. ​­ van Eemeren, F. H., & Houtlosser, P. (2007). Strategic maneuvering: A synthetic recapitulation. Argumentation, 20(4), ­ ­381–392. ​­ doi:10.1007/s10503-007-9037-z. ­­ ­​­­ ­​­­ ​­ Elqayam, S., & Evans, J. St. B. T. (2011). Subtracting “ought” from “is”: Descriptivism versus normativism in the study of human thinking. Behavioral and Brain Sciences, 34, 233–248. ­ ​­ doi: 10.1080/13546783.2013. ­ 834268. Eva, B., & Hartmann, S. (2018). Bayesian argumentation and the value of logical validity. Psychological Review, 125(5), ­ ­806–821. ​­ doi:https://doi.org/10.1037/rev0000114. ­ ­ ­

153

Peter J. Collins and Ulrike Hahn Felton, M., & Kuhn, D. (2001). The development of argumentive discourse skill. Discourse Processes, 32(2–3), doi:10.1080/0163853X.2001.9651595. ­­ ​­ ­135–153. ​­ ­ Garssen, B. J. (1997). Argumentatieschema’s in pragma-dialectisch perspectief: Een theoretisch en empirisch onderzoek. IFOTT. Gordon, T. F., Prakken, H., & Walton, D. (2007). The Carneades model of argument and burden of proof. Artificial Intelligence, 171(10), doi:10.1016/j.artint.2007.04.010. ­ ­875–896. ​­ ­ Hahn, U. (2011). The problem of circularity in evidence, argument, and explanation. Perspectives on Psychological Science, 6(2), doi:10.1177/1745691611400240. ­ ­172–182. ​­ ­ Hahn, U. (2014). The Bayesian boom: Good thing or bad?. Cognitive Science, 5, 765. doi:10.3389/fpsyg. 2014.00765. Hahn, U., Bluhm, R., & Zenker, F. (2017). Causal argument. In M. Waldman (Ed.), The Oxford Handbook of Causal Reasoning (pp. Oxford University Press. ­­  ­475–494). ​­ Hahn, U., Harris, A. J. L., & Corner, A. (2009). Argument content and argument source: An exploration. Informal Logic, 29(4), ­ ­337–367. ​­ Hahn, U., & Hornikx, J. (2016). A normative framework for argument quality: Argumentation schemes with a Bayesian foundation. Synthese, 193(6), doi:10.1007/s11229-015-0815-0. ­ ­1833–1873. ​­ ­­ ­​­­ ­​­­ ​­ Hahn, U., & Oaksford, M. (2007). The rationality of informal argumentation: A Bayesian approach to reasoning fallacies. Psychological Review, 114(3), doi:10.1037/0033–295X.114.3.704. ­ ­704–732. ​­ ­­ ​­ Hahn, U., & Oaksford, M. (2012). Rational Argument, in K. J. Holyoak & R. G. Morrison (Eds.), The Oxford Handbook of Thinking and Reasoning (pp. Oxford University Press. ­­  ­277–298). ​­ Hahn, U., Oaksford, M., & Harris, A. J. L. (2012). Testimony and argument: A Bayespian perspective. In F. Zenker (Ed.), Bayesian Argumentation (pp. Springer. ­ ­­  ­15–38). ​­ Haigh, M., Ferguson, H. J., & Stewart, A. J. (2014). An eye-tracking investigation into readers’ sensitivity to actual versus expected utility in the comprehension of conditionals. The Quarterly Journal of Experimental Psychology, 67(1), doi:10.1080/17470218.2013.797475. ­ ­166–185. ​­ ­ Haigh, M., Stewart, A. J., & Connell, L. (2012). Reasoning as we read: Establishing the probability of causal conditionals. Memory & Cognition, 41(1), doi:10.3758/s13421-012-0250-0. ­ ­152–158. ​­ ­­ ­​­­ ­​­­ ​­ Hamblin, C. L. (1970). Fallacies. Methuen. Harris, A. J. L. et al. (2016). The appeal to expert opinion: Quantitative support for a Bayesian network approach. Cognitive Science, ­1496–1533. doi:10.1111/cogs.12276. ​­ ­ Harris, A. J. L., Hsu, A. S., & Madsen, J. K. (2012). Because Hitler did it! Quantitative tests of Bayesian argumentation using ad hominem. Thinking & Reasoning, 18(3), doi:10.1080/13546783.2012.670753. ­ 311–343. ­ ​­ ­ Hastings, A. C. (1962). A Reformulation of the Modes of Reasoning in Argumentation. Northwestern University. Hemberger, L., Kuhn, D., Matos, F., & Shi, Y. (2017). A dialogic path to evidence-based argumentive writing. Journal of the Learning Sciences, 26(4), doi:10.1080/10508406.2017.1336714. ­ ­575–607. ​­ ­ Hoeken, H., Timmers, R., & Schellens, P. J. (2012). Arguing about desirable consequences: What constitutes a convincing argument?. Thinking & Reasoning, 18(3), doi:10.1080/13546783.2012.669986. ­ ­394–416. ​­ ­ Humă, B., Stokoe, E., & Sikveland, R. O. (2020). Putting persuasion (back) in its interactional context. Qualitative Research in Psychology, 17(3), doi:10.1080/14780887.2020.1725947. ­ ­357–371. ​­ ­ Iordanou, K., & Kuhn, D. (2020). Contemplating the opposition: Does a personal touch matter?. Discourse Processes, 57(4), doi:10.1080/0163853X.2019.1701918. ­ ­343–359. ​­ ­ Kadane, J. B., & Schum, D. A. (1996). A Probabilistic Analysis of the Sacco and Vanzetti Evidence. John Wiley & Sons. Kienpointner, M. (1992). Alltagslogik: Struktur und Funktion von Argumentationsmustern. Friedrich Fromman. Kiesel, J., Al Khatib, K., Hagen, M., & Stein, B. (2015). A shared task on argumentation mining in newspaper editorials. In Proceedings of the 2nd Workshop on Argumentation Mining (pp. ­­  ­35–38). ​­ Denver, CO. Korb, K. (2004). Bayesian informal logic and fallacy. Informal Logic, 24(1). ­ doi:10.22329/il.v24i1.2132. ­ Kuhn, D. (1991). The Skills of Argument. Cambridge University Press. Kuhn, D., Cheney, R., & Weinstock, M. (2000). The development of epistemological understanding. Cogni­ ­309–328. ​­ ­­ ​­ ­ ­­ ​­ tive Development, 15(3), doi:10.1016/S0885–2014(00)00030-7. Kuhn, D., & Crowell, A. (2011). Dialogic argumentation as a vehicle for developing young adolescents’ thinking. Psychological Science, 22(4), ­ 545–552. ­ ​­ doi:10.1177/0956797611402512. ­ Kuhn, D., & Modrek, A. (2018). Do reasoning limitations undermine discourse? Thinking & Reasoning, 24(1), ­ ­97–116. ​­ doi:10.1080/13546783.2017.1388846. ­

154

Experimental studies of argumentation Kuhn, D., & Modrek, A. (2021). Mere exposure to dialogic framing enriches argumentive thinking. Applied Cognitive Psychology, 35(5), doi:10.1002/acp.3862. ­ ­1349–1355. ​­ ­ Kuhn, D., & Modrek, A. (2022). Choose your evidence. Science & Education, 31(1), doi:10.1007/­­ ­ 21–31. ­ ​­ s11191-021-00209-y. ­​­­ ­​­­ ​­ Kuhn, D., Wang, Y., & Li, H. (2010). Why argue? Developing understanding of the purposes and values of argumentive discourse. Discourse Processes, 48(1), doi:10.1080/01638531003653344. ­ 26–49. ­ ​­ ­ Lawrence, J., & Reed, C. (2020). Argument mining: A survey. Computational Linguistics, 45(4), ­ ­765–818. ​­ doi:10.1162/coli_a_00364. ­ McEnery, A., & Baker, P. (2015). Corpora and Discourse Studies: Integrating Discourse and Corpora. Springer. O’Keefe, D. J. (2013). The relative persuasiveness of different forms of arguments-from-consequences: A review and integration. In E. L. Cohen (Ed.), Communication Yearbook 36 (pp. Routledge. ­­  ­109–135). ​­ O’Keefe, D. J. (2015). Persuasion: Theory and Research. SAGE Publications. Paglieri, F. (2021). Less scheming, more typing: Musings on the waltonian legacy in argument Technologies. FLAP, 8(1), ­ 219–244. ­ ​­ ­ WilkinPerelman, C., & Olbrechts-Tyteca, L. (1969). The New Rhetoric: A Treatise on Argumentation (J. son & P. Weaver, Trans.). University of Notre Dame Press. Prado, J. et al. (2020). The neural bases of argumentative reasoning. Brain and Language, 208, 104827. ­ doi:10.1016/j.bandl.2020.104827. Prakken, H., & Vreeswik, G. A. W. (2002). Logics for defeasible argumentation. In D. M. Gabbay & F. ­ ­­  ­219–318). ​­ Guenther (Eds.), Handbook of Philosophical Logic (2nd ed.) (pp. Kluwer Academic Publishers. Rosenkrantz, R. D. (1992). The justification of induction. Philosophy of Science, 15, ­527–539. ​­ doi: 10. ­ 1086/289693. Schellens, P. J. et al. (2017). Laypeople’s evaluation of arguments: Are criteria for argument quality scheme­ ­681–703. ​­ ­­ ­​­­ ­​­­ ​­ specific?. Argumentation, 31(4), doi:10.1007/s10503-016-9418-2. Shapiro, S., & Kouri Kissel, T. (2021). Classical logic. In E. N. Zalta (Ed.), The Stanford Encyclopedia of Philoosphy (Spring 2021 Edition). Metaphysics Research Lab, Stanford University. Shi, Y., Matos, F., & Kuhn, D. (2019). Dialog as a bridge to argumentative writing. Journal of Writing ­ ­107–129. ​­ ­­ ​­ Research, 11(1), doi:10.17239/jowr-2019.11.01.04. Stanovich, K. E. (2012). On the distinction between rationality and intelligence: Implications for understanding individual differences in reasoning. In K. J. Holyoak & R. G. Morrison (Eds.), The Oxford Handbook of Thinking and Reasoning (pp. Oxford University Press. ­­  ­343–365). ​­ Stenning, K., & Lambalgen, M. van (2012). Human Reasoning and Cognitive Science. MIT Press. Tindale, C. W. (2004). Rhetorical Argumentation: Principles of Theory and Practice. SAGE. Toulmin, S. (1958). The Uses of Argument. Cambridge University Press. Uher, J. (2018). Quantitative data from rating scales: An epistemological and methodological enquiry. Fron­ tiers in Psychology, 9. doi:10.3389/fpsyg.2018.02599. Walker, M. et al. (2012). A Corpus for Research on Deliberation and Debate. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC’12). LREC 2012, Istanbul, Turkey: European Language Resources Association (ELRA), 812–817. Walton, D. N. (2006). Fundamentals of Critical Argumentation. Cambridge University Press. Walton, D. N. (2008). Informal Logic: A Pragmatic Approach (2nd ed.). Cambridge University Press. Walton, D. N., Reed, C., & Macagno, F. (2008). Argumentation Schemes. Cambridge University Press. Zavala, J., & Kuhn, D. (2017). Solitary discourse is a productive activity. Psychological Science, 28(5), ­ 578– ­ ​ ­586. doi:10.1177/0956797616689248. ­ Zenker, F. (2012). Bayesian Argumentation: The Practical Side of Probability. Springer.

155

10 EXPERIMENTAL RESEARCH IN CROSS-LINGUISTIC ­ ​­ PSYCHOLINGUISTICS Sebastian Sauppe, Caroline Andrews and Elisabeth Norcliffe

10.1

Introduction

This chapter provides an overview of cross-linguistic psycholinguistics research, including work that is conducted outside of traditional lab contexts and/or in the field. We aim to provide, where possible, practical examples and advice for researchers who are interested in adapting current experimental psycholinguistic methods for a cross-linguistic context. While most of the examples and specifics focus on sentence processing research, many of the recommendations and discussion can be applied to other areas of experimental language research, such as experimental phonology.

10.2

Historical perspectives

In 1995, Inoue and Fodor wrote that it “is easier to argue that parsing Japanese is impossible than to explain how it is done.” They pointed to three properties of Japanese that should, theoretically, pose a problem for the human parser: (i) verb-finality, (ii) scrambling, and (iii) null pronouns/ argument drop. On the face of it, this should have been an odd thing to have to say. None of the three properties must pose a real problem for sentence comprehension, given that Japanese speakers do, of course, successfully understand their interlocutors. But, more to the point of psycholinguistic theory, none of these properties is especially rare ­cross-linguistically. Indeed, ­Subject-Object-Verb (SOV) is the ​­ ­​­­ ​­ ­ most common basic word order among the world’s languages (Dryer, 2013b). Languages which have the exact configuration of null subject pronouns seen in Japanese are perhaps somewhat unusual in typological samples (∼ 9% in Dryer, 2013a), but adding in languages which allow null Noun Phrases (NPs) in canonical position combined with marking on the verb covers 73% of languages in Dryer’s sample, mostly because of the sheer pervasiveness of verbal agreement. Estimates of the typological prevalence of scrambling and object null pronouns are somewhat trickier to come by (Saleem, 2010; Sekerina, 1997), but neither are exceptionally rare and indeed may be more common than not. But of course, only part of Inoue and Fodor’s observation was about the objective difficulty of parsing Japanese; much of it was about the difficulty of accounting for Japanese parsing given DOI: 10.4324/9781003392972-12 156

Experimental research in cross-linguistic psycholinguistics

the psycholinguistic theories at that moment. At the time, sentence processing had been almost exclusively focused on English, which critically has none of the three properties that so worried Inoue and Fodor. More than 25 years on, the study of both Japanese and the specific properties that Inoue and Fodor highlighted have been important lines of research in psycholinguistics, leading to advances not just in our understanding of parsing in a single language but for aspects of psycholinguistic theory more generally. Verb-finality has provided a window into incrementality of both production (Hwang & Kaiser, 2014; Momma & Ferreira, 2019; Momma et al., 2016; Sauppe, 2017b; Schriefers et al., 1998) and comprehension (Konieczny et al., 1997), as well as many other issues, including filler-gap dependencies (Aoshima et al., 2004), prediction and locality (Kamide et al., 2003; Levy & Keller, 2013; Mitsugi, 2017), and turn-taking (Barthel & Sauppe, 2019). Scrambling has been studied for its own sake (Nakano et al., 2002; Sekerina, 2003) and also used as a tool to provide insight into many other phenomena, especially the delicate work of teasing apart how the parser relies on the often co-occurring cues of case, word order, and grammatical function (Tamaoka et al., 2005; Yamashita, 1997) and information structure (Ferreira & Yoshita, 2003). Likewise, null pronouns have been explored in Romance and East Asian languages and have contributed greatly to studies of information structure (Runner & Ibarra, 2016), notions of accessibility in processing (Carminati, 2002), optionality (Saah, 1995) and beyond. Japanese is but one of the more than 7,000 languages spoken in the world today, each one offering “a natural laboratory of variation” (Evans & Levinson, 2009) against which psycholinguistic theories can be tested and developed. The current empirical base of language processing research represents a small fraction of this linguistic diversity, still limited largely to Germanic and Romance families and a handful of also large East Asian languages (Blasi et al., 2022; Jaeger & Norcliffe, 2009; Norcliffe et al., 2015a). There have been many calls to broaden the typological diversity in psycholinguistics (Anand et al., 2010; Bates et al., 2001; Chung et al., 2012; Clahsen, 2016; Cutler, 1985; Evans & Levinson, 2009; Hellwig, 2019; Jaeger & Norcliffe, 2009; MacWhinney & Bates, 1989; Nielsen et al., 2017; Norcliffe et al., 2015a, inter alia). Not surprisingly, the problem also exists in child language acquisition (Kidd & Garcia, 2022; Stoll, 2009). In recent years, the cognitive sciences more generally have recognised the need for more diversity (Arnett, 2008), in particular the well-known call by Henrich et al. (2010) to move beyond WEIRD (Western Educated Industrialized Rich Democratic) populations and English-speaking participants (see also Blasi et al., 2022). Within this context, psycholinguistics is particularly well positioned to notice the opportunity to be had in looking beyond Western(ised) labs, given the long-standing and productive research traditions in adjacent fields of linguistic typology, comparative linguistics, and language documentation, which directly engage with linguistic variation and bring it into central focus.

10.3

Critical issues and topics

To the extent that psycholinguistic theory makes any claims about human cognition or how human language is processed—whether those claims are about the potential range of variation or characteristics that may be universal—it is necessary to expand cross-linguistic coverage. Doing this serves two functions. First, replications of previously established effects in new languages provide critical empirical support for the (assumed) universality of processing mechanisms. Second, researching new languages makes available new linguistic phenomena that fall outside of the scope of current psycholinguistic theories. Thus, a productive approach is to select a “target” language that has grammatical properties that provide a new window on current psycholinguistic theories, for example, because existing theories make no or conflicting predictions about how 157

Sebastian Sauppe et al.

certain features of the grammar should influence processing or because a language allows testing a claim that is not possible to examine with other languages. Additionally, cross-linguistic research presents an opportunity to expand the field methodologically. Research in non-lab contexts, especially with technical equipment such as eye trackers or EEG, comes with challenges that researchers typically avoid in the lab. Until recently, many of these challenges were largely insurmountable. New technological advances have made methods much more portable, but new environments present a much greater range of lighting, electrical, social, and political issues, etc., all of which can impact data interpretation. Moreover, new populations themselves come with challenges that demand adaptation. For instance, many community languages are not written as frequently, and, therefore, speakers may not be literate in their native language, even when they are literate in the local lingua franca. Additionally, Western researchers are likely to underestimate the impact that constant experience with technology has had on their typical undergraduate subjects. This can mean that critical timing issues that were painstakingly established in the lab—for example, the appropriate stimulus onset time in a picture-word interference study (e.g., Meyer, 1996)—may need to be re-discovered when working outside of the laboratory and with new populations.

10.4

Current contribution and research

Since Inoue and Fodor’s article, the landscape of cross-linguistic psycholinguistics research has shifted considerably. There is now a growing group of researchers producing considerably more cross-linguistic work than ever before.1 In this section, we present two case studies from the recent sentence production literature that we think successfully meet the criteria for using diverse languages to broaden the empirical domain. The first is an example where the basic phenomenon has been studied before, but a cross-linguistic perspective provides a new point-of-view, namely ergative case. The second explores a linguistic construction, switch reference, which has previously received no attention in psycholinguistics.

10.4.1

Case study 1: Planning of sentences with ergative case marking

Case marking is a cross-linguistic strategy for signalling how nominal elements of a sentence relate to the verb and to each other. In the sample of the World Atlas of Language Structures, 52% of languages have some type of case system (Iggesen, 2013); this, however, is also the source of enormous cross-linguistic variation. Ergative case alignment, for instance, is a common pattern of marking on argument nouns in which the agent of a transitive verb receives a case marker distinct from the case on objects or intransitive subjects (Bickel & Nichols, 2009). This differs from the case pattern found in all Indo-European languages in Europe, including English, the exhibit of which is known as nominative-accusative alignment. In nominative-accusative alignment, the subjects of transitives and intransitives have the same case marking, and objects are the odd ones out (being marked by accusative). For nominative-accusative languages, it is well established that when describing a scene depicting a transitive event (i.e., one with an agent, a patient, and an action, such as a dog chasing a ball), speakers look to each element of the event in the order that they would be mentioned in a sentence (Griffin & Bock, 2000). This is a demonstration of incrementality in planning: producers do not wait until they have fully planned a sentence before beginning their utterance, but rather plan the early parts of a sentence and begin speaking those before they have finished planning the later parts (see, e.g., Bock & Ferreira, 2014). Evidence from picture-word interference studies aligns with this 158

Experimental research in cross-linguistic psycholinguistics

finding, in showing that agents are planned independently at the beginning of the sentence, before the planning of verbs or patients (Momma & Ferreira, 2019; Momma et al., 2016). The preparation of a nominative initial noun phrase allows some flexibility in planning for the speaker (cf., e.g., Ferreira & Swets, 2002; Myachykov et al., 2013) because it does not require the speaker to make any commitments about the existence of a patient or whether the verb is transitive or intransitive, providing additional time to plan the patient and verb while the agent is being pronounced. Ergative case alignment presents a speaker with a different set of constraints, however, and therefore requires a different set of strategies. When planning a sentence in an ergative language, the choice of case marker on the initial noun (usually the agent) depends on the transitivity of the verb (ergative if transitive, absolutive if intransitive). Given this dependency, speakers of ergative languages need to plan at least some information about the verb to select the appropriate case marker on the initial noun. They should, in other words, engage in more up-front relational planning (cf. Bock & Ferreira, 2014; Konopka, 2019) when formulating simple sentences, compared to speakers of nominative-accusative languages. Egurtzegi et al. (2022) provide evidence for such diverging planning strategies by comparing sentence planning in Basque (ergative agents) and Swiss German (nominative agents) in a picture description study. Using eye tracking, Egurtzegi et al. found that the peak proportion of looks to the agent was later in Basque than German, indicating that German speakers focused on the agent early, and then moved on to planning the verb and patient. Basque speakers, by contrast, split their attention more between different aspects of the pictures during initial planning, which indicates increased relational encoding. Using EEG, Egurtzegi et al. found that differences in neural activity (in the theta, alpha, and beta frequency bands) also indicate that ergative case marking requires an earlier commitment to a sentence structure than nominative-accusative case marking. This between-language difference in sentence planning has also been shown to apply within a single language, Hindi, which has a split case marking system. Hindi requires ergative-marked agents in the perfective aspect and nominative-marked agents in the imperfective aspect. Sauppe et al. (2021) found that Hindi speakers shifted their gaze more between the agent and other aspects of the pictures when planning ergative sentences compared to nominative sentences. When planning sentences with nominative agents, Hindi speakers concentrated on the agent during the early phases of planning. Neural activity measured through EEG showed again that speakers committed to the structure under preparation earlier when planning sentences with ergative case marking because they need to commit to the transitivity of the verb early. This parallels the differences found in Egurtzegi et al.’s comparison of Basque and German sentence planning: increased relational encoding presumably helps to establish which kind of verb should be used, which, in turn, determines whether the initial noun should carry nominative or ergative case. In sum, these studies show that what appears to be a relatively small morphosyntactic difference has substantial effects on processing.

10.4.2

Case study 2: Planning of sentences with switch reference marking

Our second case study focuses on switch reference, a linguistic phenomenon which has not, until recently, been studied psycholinguistically, but which provides a useful testing ground for refining theories of planning scope during sentence production. Switch reference is a system of chaining clauses together into a single sentence in which each (nonfinal) clause in the chain includes a morpheme (typically on the verb) that indicates whether the next clause has the same subject as the current clause or a different one (see van Gijn, 2016). The sentence in (1) provides an example of switch reference in Nungon, a Finisterre-Huon language spoken in Papua New Guinea. 159

Sebastian Sauppe et al.

­1 [Nan-na ­ ​­ om-un-a], ­​­­ ​­ [ongo-ng-a], ­ ­​­­ ​­ [Imom ­ir-a], ​­ [moro ­to-ng-a], ­​­­ ​­ [father-1sg.poss ­ ​­ ­die-ds.3sg-mv] ​­ ­ ​­ [go-dep-mv] ­ ­​­­ ​­ [Imom ­be-mv] ​­ [big ­do-dep-mv] ­​­­ ​­ [e-ng-a], ­ ­​­­ ​­ [ngo-ndo ­ ​­ ­ir-a], ​­ [amna ­to-ng ​­ ­hi-go-t]. ­​­­ ​­ My father having died, (I) went on, stayed in Imom, became big, came (here, in Yawan), staying here, I took a man (as husband). (Adapted from Sarvasy, 2014, p. 343) The sentence in (1) has eight verbs, one of which has a third-person subject (om, ’die’) and the rest ­ of which all have the first-person speaker as the subject of their clause. Rather than indicating the identity of the subject in each individual clause, which would be repetitive for the seven verbs which share a subject, Nungon’s particular instantiation of switch reference allows speakers to economise by simply dropping the subject-verb agreement when the subsequent-verb will have the same subject (argument-drop for the subject is widely used as well, as it is in this example). This means that the only two verbs that need overt agreement are the last one, which marks the first-person agreement for all seven first-person clauses, and the very first clause which has nan ’father’ as the subject. The agreement for this latter subject cannot be recovered anywhere else and therefore cannot be skipped. Consequently, the third-person agreement morpheme—un—in the first clause doubles as an indica­ ​­ ­ ­​­­ tor that the next clause will have a new subject, while the lack of subject agreement on the second through seventh verbs functions as an indicator that in each the next verb has an identical subject. Switch reference presents a challenge for current models of sentence production. Based on the languages studied so far, current theories take the clause as a major unit of production (cf., e.g., Smith & Wheeldon, 1999), where the notion of a unit, in this case, implies that at some level the verb and its arguments are planned together. It is not immediately clear whether or how this unit size would apply to, say, the second verb in (1), ongo ’go’ which is four clauses away from any realisation of its argument. In a visual world eye-tracking experiment, Sarvasy et al. (2022) found that Nungon speakers shifted their attention in accordance with whether the current verb was marked to share the same subject or have a different subject from the subsequent clause. Moreover, speakers did so well before the onset of the morpheme itself, and therefore prior to the start of the subsequent clause (which could have been an alternative inflection point for gaze allocations). This maintains the tight link between gaze allocation and sentence planning in current models of psycholinguistics (Norcliffe & Konopka, 2015), but substantially expands the size of the planning window commonly assumed for sentence production. The evidence from Nungon thus provides a perspective on sentence planning that could not have been obtained from the languages usually represented in the psycholinguistics literature. Taken together, the studies reviewed in this section show that cross-linguistic research “can guide the development of theories by revealing the full extent of the human ability” to produce (and comprehend) language (Jaeger & Norcliffe, 2009, p. 866).

10.5

Research methods and recommendations for practice

Cross-linguistic studies will, in many cases, need to be carried out outside of a university laboratory, in the places where the languages of interest are spoken or signed. This means that some experimental methods may not be used or that the available methods need to be adapted to the circumstances. Wagers and Chung (2019) elaborate on many aspects of designing and carrying out experiments “in the field”. Speed et al. (2017) provide an overview of the best practices and pitfalls 160

Experimental research in cross-linguistic psycholinguistics

around cross-cultural experimental linguistic research and Whalen and McDonough (2015) also discuss aspects of field-based experiments, with a focus on phonetic data. Here, we discuss some additional requirements of cross-linguistic experimental research. What is likely holding back many attempts of experimental psycholinguistic research outside of the laboratory is not any perceived lack of scientific interest, but rather practical concerns about data collection in sites away from the researchers’ universities. Each language and the context and location of its speaker community pose their own challenges, including the logistics, the sociocultural aspects of conducting experimental research, and the endangerment and documentation status of the languages (Seifart et al., 2018). Therefore, in-depth knowledge of the target language and the culture of its speakers are essential to successfully conduct experimental research. This means that intensive fieldwork and long-term engagement with the community are usually required, especially for lesser-described languages with small speaker communities. Collaboration with descriptive or field linguists working on the target language is often essential because the available linguistic descriptions may not cover the phenomena of interest in sufficient detail or because there are interactions between grammatical domains which might otherwise be easy to overlook. At the same time, field linguists can profit from such a collaboration because it may provide them with the opportunity to more closely examine aspects of the target language that they have not yet studied (e.g., because a certain phenomenon did not yet occur in their materials). Næss and Sauppe (under review) provide a case report on a collaboration between descriptive linguists and psycholinguists about carrying out a sentence comprehension experiment on Äiwoo, spoken in Solomon Islands. One question that arose during the preparation of this study was whether there are word order constraints on sentence-initial prepositional phrases, something that had not been documented in detail before. Collaborations between field linguists and psycholinguists must be “placed on an equal footing, ensuring that the standards of both disciplines are maintained” (Hellwig, 2019, p. 9), so that ideally both the experimental work and the description and documentation of the language are fostered. There are a number of comprehensive introductory and overview texts that provide further information on the practical, conceptual, and ethical issues surrounding linguistic fieldwork, such as Chelliah and de Reuse (2011), Thieberger (2012), and Bowern (2008). Since only a small fraction of the world’s languages have been studied experimentally, there are also many well-described (“large”) languages that have not been comprehensively investigated by psycholinguistics yet. Surveying the grammatical and usage characteristics of these languages is facilitated by the availability of grammatical descriptions, linguistic research publications, and corpora of written and spoken language use and learning (Gries, 2012; Stoll & Schikowski, 2020). Among these larger languages are, for example, Indonesian and Swahili, but also Hindi (e.g., Choudhary et al., 2009; Husain et al., 2015; Rubio-Fernandez & Jara-Ettinger, 2020), Tagalog (e.g., Pizarro-Guevara & Wagers, 2020; Sauppe, 2017a), or Arabic (e.g., Flecken et al., 2014; Matar et al., 2019).

10.5.1

Stimulus materials

The creation of stimulus materials is also tied to knowledge of a language’s grammar and use, as well as the cultural context of the speech community. Studies on the processing of single words, for example, require in-depth knowledge about the structure of the lexicon. Studies on sentence comprehension require stimulus sentences with the intended target structures that must also be grammatical and felicitous in all other pragmatic and cultural aspects. For example, sentences like the classical English “lawyer” sentences in the study of relative clause processing (such as “The banker that irritated the lawyer/the lawyer irritated played tennis every Saturday”; Traxler et al., 161

Sebastian Sauppe et al.

2002), while being grammatical, felicitous, and acceptable, can describe scenarios that are quite abstract or somewhat unusual. When preparing stimulus sentences for cross-linguistic studies, one needs to keep in mind that scenarios that are too abstract or unusual may cause difficulties in participants’ understanding of the research, especially when it is hard for them to make sense of the task. Language processing studies that use pictures to track eye movements or elicit utterances or words (e.g., Mulak et al., 2021; Norcliffe et al., 2015b; Nordlinger, Garrido Rodriguez et al., 2022; Sauppe et al., 2021) need pictures that are culturally appropriate and show concepts that are known to the participants. For example, someone who has never seen an elephant or jaguar may not recognise these animals and may have difficulty naming them. Participants also may not have experience with anthropomorphic animals, as they are commonly portrayed in cartoons, because as children they did not have access to television or other media, in contrast to the commonly studied Western university students. Garrido Rodriguez et al. (2023) made photographs of objects that are in daily use or well known in a speaker community and used them in a comprehension study to ensure that visual stimuli were easily recognised by participants. Such object photographs are also potentially useful for word production studies. To elicit sentences, depictions of (dynamic) actions are often used. Drawings are especially suitable for this purpose because they allow showing a large variety of scenarios and objects, which may be hard to photograph, for example, when featuring wild life (such as a lion hunting a zebra). However, a large number of actions involving two humans (e.g., pushing, kicking, feeding, or kissing) or a human and an inanimate object (e.g., peeling fruit, cutting paper, or watering a plant) can also be staged and photographed to create realistic and easily recognisable scenarios for eliciting descriptions in sentence form (for examples see Isasi-Isasmendi et al., 2023; Sakarias & Flecken, 2019). Experimental data that are contaminated by participants struggling to understand and contextualise culturally or linguistically incongruous materials may not be interpretable. However, this does not mean that stimulus sentences cannot describe novel or unusual scenarios or that pictures cannot show new or potentially unknown objects. Consulting with field linguists who have indepth knowledge of the socio-cultural aspects of language use and with speakers of the target language are essential to find out what might or might not be experimentally felicitous. Testing the stimuli (pictures, sentences, and individual words, etc., depending on the experiment’s aims) before commencing data collection is equally important to get a practical sense of what works well and what doesn’t (especially since introspection and thinking about how certain stimuli will be perceived can turn out to be too pessimistic or too optimistic). Since there are many factors that influence how language is processed, experimental studies often seek to control for at least some of those factors. At the level of individual words, for example, the lexical frequency and the phonological neighbourhood density can affect recognition and production (Cutler, 2012; Vitevitch & Luce, 2016). When selecting the words to be used in the stimuli, one would want to take these factors into account. The emergence and availability of large online corpora already allow the compilation of lexical frequency but also constructional frequencies, for many languages, including lesser described ones: the Universal Dependencies database contains syntactic treebanks for more than 100 languages (de Marneffe et al., 2021) and the Leipzig Corpora Collection hosts corpus-based monolingual dictionaries and downloadable corpora crawled from the internet for nearly 300 languages (https://corpora.uni-leipzig.de/en). However, for languages with fewer resources, no internet-based corpora have been built. In some cases, it is possible to generate the necessary information, for example, about the frequency of constructions. To validate the observation that the most frequent word order in Äiwoo places the patient (or object) sentence-initially (Næss, 2015, 2021), Sauppe et al., (under revision) manually 162

Experimental research in cross-linguistic psycholinguistics

annotated the available corpus of glossed texts in the language (Næss, 2017) with syntactic information. These annotations allowed an estimate of how likely a patient-initial position was for each noun and utterance in the corpus. Although such ad hoc resources do not provide such broad coverage as corpora and databases based on hundreds of thousands or even millions of sentences, they nevertheless may provide the best available information on a target language and thus crucially contribute to making stimulus creation (and statistical analyses) more comprehensive. Sometimes it is not possible or feasible to compile such information (e.g., when there are no text collections, or larger dictionaries don’t exist). For frequency estimates, especially of individual words, it is instead possible to obtain subjective frequency ratings (Brysbaert & Cortese, 2011; Thompson & Desrochers, 2009). Especially for studies on small and understudied languages with rare linguistic features, the knowledge gain associated with an experimental study will in most cases outweigh the risks of not being able to control some aspects of the stimuli. When designing an experiment and preparing stimuli, the possible confounding factors should be considered, and measures should be taken to mitigate them (as far as possible). For example, stimuli could be stratified based on available linguistic information (e.g., including both words that are ostensibly highly frequent and words that are ostensibly less frequent) or based on extra-linguistic factors (e.g., including picture stimuli that show easily recognisable actions and pictures that show more complex actions or that require more interpretation, cf. van de Velde et al., 2014).

10.5.2

Participants

When studying a “large” language with many speakers, recruiting and testing participants may not differ much from how it would be done in Western universities. Especially if experiments can be carried out at a university, the student population can be invited to participate through posters or presentations in classes. However, when target languages are spoken by small or rural communities, the number of participants available may be small, and recruitment strategies must be tailored to the local situation (Speed et al., 2017), with the help of collaborators from the speaker community or a team member with in-depth knowledge of the local and socio-cultural circumstances. One possibility of engaging the community and inviting participation is to communicate the objectives and procedure of the experiment in (informally held) information meetings (see Næss & Sauppe, under review). Another possibility is to involve a native speaker collaborator as a “multiplier” to disseminate information about the research in the community to encourage other speakers to participate (Næss & Sauppe, under review). This approach presupposes that experimental linguists and psycholinguists, if they come to the community from outside, are already able to communicate the purpose and procedure in a way that inspires potential “multipliers”. However, in general, it is important that participant recruitment is organised in a non-coercive way. Ethical approval should be obtained from the ethics committee or institutional review board of the researcher’s university and also, if possible, from a local authority such as a university in the area where the target language is spoken or from the (regional) government (Rice, 2006; Whalen & McDonough, 2015). This is also the case when targeting the community of speakers of a small language that may exist among the students of a university, e.g., in the capital of a region or country. In addition, the plan for participant recruitment for experimental studies in the field needs to take into account the right time for data collection (e.g., the seasonality, such as avoiding the harvest season for communities living primarily from agriculture or taking into account the monsoon and other weather phenomena, Speed et al., 2017). It is also advisable to schedule enough time for data collection and prepare alternative recruitment approaches ahead of time to be able to change 163

Sebastian Sauppe et al.

gears in reaction to the circumstances. For example, Næss and Sauppe (under review) intended to conduct their EEG study in the group of islands where Äiwoo is spoken but had to adapt and recruit participants in the capital of Solomon Islands because air travel was unexpectedly not available as planned and getting to the islands was not possible. In recent years, web-based experiments have experienced an upswing, driven by improvements in technical accessibility and precision (Bridges et al., 2020; Crump et al., 2013; Sauter et al., 2020; Stewart et al., 2017) and probably further facilitated by the COVID-19 pandemic that prevented many researchers from collecting data in the laboratory or in the field. Garcia et al. (2022) describe how online experiments can be used for psycholinguistic studies with understudied languages, exemplified with a priming study on Tagalog (Garcia & Kidd, 2020). Online studies thus enable the study of languages with a larger speaker population with sufficient access to the internet and smartphones or computers. This development will make it increasingly more viable to conduct ­ ​­ cross-linguistic psycholinguistic studies.

10.5.3

Experimental tasks and measurements

Although much of the research in psycholinguistics and experimental linguistics is based on written materials, such as judging the acceptability of written sentences or paradigms measuring reading behaviour to study word and sentence processing (e.g., eye tracking while reading, self-paced reading, or rapid serial presentation in EEG studies), many languages are only spoken or signed. Access to economic and educational participation also varies widely between speaker communities, so that literacy (in general and in the target language) may be low. For studies outside a laboratory, auditory stimuli and paradigms that require spoken responses are, therefore, often more suitable. For participants who are used to smartphones, responses by button press on a response pad or game controller may be unproblematic. For participants who are not used to button presses, the experimenter could consider only using vocal responses (e.g., in response to comprehension questions) or training participants to tap on a picture displayed on a tablet. In any case, experiments need to be tailored for use in the field. People in non-WEIRD communities are usually not used to being tested and are, therefore, often not as socialised as “compliant responders” (Speed et al., 2017) as students in Western universities that typically participate in psycholinguistic studies (Arnett, 2008). Thus, the experimental tasks need to be adapted or contextualised so that they are clear to understand. If they are too opaque, it can be difficult to convey the purpose of the experiment, given the likely differences in common ground on the (implicit) expectations for experiments and linguistic research in general (Speed et al., 2017). Speaker communities of target languages for experiments often already encountered descriptive linguistic research (as this is the main way the grammatical features of the language become known to the research community). Consequently, ideas about what language research “is about” may be centred on language documentation or the compilation of dictionaries and other text collections. Explaining the purpose and procedures of psycholinguistic and experimental linguistic research is, therefore, important to generate common ground. In principle, all the measurement techniques commonly used in psycholinguistics can also be used in experimental studies in the field. Paradigms that measure reaction times or accuracy, for example, in response to answering comprehension questions about stimulus sentences, require only a laptop (and possibly a microphone). The self-paced listening paradigm (Waters & Caplan, 2004; Waters et al., 2003), for example, can provide a measure of the online time course of sentence comprehension without the need for additional equipment. Self-paced listening has been used with different age groups (Fallon et al., 2006; Suzuki, 2013), suggesting that it can also be a versatile tool 164

Experimental research in cross-linguistic psycholinguistics

for field-based studies. Another behavioural method that can give insight into online comprehension of sentences is “touch (or finger) tracking” (a variant of mouse tracking, cf., e.g., Kieslich et al., 2019; Spivey & Dale, 2006), where participants move objects on the screen of a tablet. In a study by Wagers et al. (2018), Chamorro speakers listened to ambiguous relative clauses and moved a small “puck” to one of two pictures corresponding to either interpretation of the relative clause as they were listening to the sentence, depending on how they comprehended it. Wagers et al. analysed the latency of the first initiation of a movement as an indicator of the online sentence parsing process. Eye-tracking devices and EEG devices have become increasingly mobile, so that they can usually be brought to the field site. Sauppe et al. (under revision) transported the equipment for an EEG experiment in a large backpack, comprising two laptops, a mobile EEG device (Neuroelectrics Enobio 32) and the necessary accessories, sound speakers, keyboards, a button response box, as well as power banks and solar panels. Yasunaga et al. (2015) also conducted a field-based EEG study, investigating sentence comprehension in Kaqchikel, a Mayan language of Guatemala. Other neuroimaging techniques, such as near-infrared spectroscopy (NIRS, measuring local changes in haemoglobin oxygenation in the cortex with light sensors), are also becoming increasingly mobile (cf., e.g., Lloyd-Fox et al., 2014; Pinti, et al., 2020), with the potential to become a valuable neurophysiological method for field-based studies on language processing (cf. Minagawa & Cristia, 2019, for an overview of the technique applied to language processing research). An example of the application of functional NIRS is Koizumi et al. (2020), who used it to study the differences in planning subject-initial and verb-initial sentences in Kaqchikel. Improvements in techniques such as webcam-based eye tracking (Vos et al., 2022; Yang & Krajbich, 2021) make it possible to measure visual attention to picture or video stimuli remotely in online studies (cf. Garcia et al., 2022). With this technique, the “visual world paradigm” (Huettig et al., 2011) can be used to study word recognition or sentence comprehension in understudied languages (for examples of visual world eye-tracking studies in situ see, e.g., Garrido Rodriguez et al., 2023; Mishra et al., 2012; Sauppe, 2016). Although allowing only the use of stimuli in which areas of interest are placed relatively far apart due to their lower resolution and precision, webcambased eye tracking may also function as a low-cost alternative to conventional research-grade eye trackers (which still cost at least several thousand dollars). Such an experimental setup only requires a laptop computer with a webcam and an internet connection or an additional computer that functions as the local server, which could enable the use of eye tracking for research teams who could otherwise not afford to buy an eye tracker. Before starting a study in the field, the equipment to be used should be extensively piloted. This should ensure that the experimenter is highly confident in operating the devices and is able to adapt settings or the study procedure if necessary. This could, for example, involve testing an EEG setup in the tropical house of the botanical garden if researchers based in colder climates plan to collect data on a language spoken closer to the equator (and if it is already known that no air conditioning will be available). Hellwig (2019) argues for the use of stimulus materials such as picture stories or objects to elicit “semi-structured” responses because these are arguably more easily integrated with descriptive and documentary linguistic work. Cross-linguistic research on sentence production has followed a variant of this approach by eliciting picture descriptions while measuring visual attention with eye trackers (Koizumi et al., 2020; Norcliffe et al., 2015b; Nordlinger et al., 2022; Sarvasy et al., 2022; Sauppe et al., 2013; Sauppe et al., 2021). Participants’ descriptions are usually elicited freely, without restricting what they can say, but still guided by the semantic content of the picture stimuli. To study comprehension, sentence-picture matching tasks also use visual stimuli. In these tasks, participants hear a sentence while seeing two different pictures (e.g., of a dog biting a man 165

Sebastian Sauppe et al.

and of a man biting a dog), needing to decide which one matches the sentence (Wagers et al., 2015; Wagers et al., 2018). Finally, the design of an experimental task also influences the amount of data that can be collected. Speakers of understudied languages may not be used to the “test-taking” and the potentially repetitive character of psycholinguistic experiments. In addition to finding a task that works well for most participants, the researcher must take into account how much data can be collected from each participant, balancing the availability of participants, the number of stimuli, and the length of the experimental session. It may be difficult to achieve the statistical power that should be aimed at in psycholinguistics, in general (Brysbaert, 2019; Vasishth et al., 2018). However, some strategies can be considered to mitigate the power problem, such as dividing data collection into multiple sessions to collect more trials for each participant without making individual sessions too exhausting (cf. e.g., Smith & Little, 2018, for small N designs). Bayesian statistics allows one to explicitly take into account prior expectations and knowledge for making inferences about a study’s results (Dienes & Mclatschie, 2018; Kruschke & Liddell, 2018; Nicenboim & Vasishth, 2016; van de Schoot et al., 2021) and is, therefore, possibly more suitable for analysing smaller datasets. Sequential analysis designs make it possible to economise on the number of participants because they allow assessing the support for the hypothesis after each participant so that more data can be added if the support is inconclusive and data collection can be stopped once the desired level of statistical support is reached (Elsey, 2021; Lakens, 2014; Mani et al., 2021; Schönbrodt et al., 2017). Especially in the context of field-based research, where it is often difficult and expensive to obtain data, these approaches may help to successfully conduct and publish studies.

10.6

Future directions

Given the historical focus on a small set of languages (Blasi et al., 2022; Jaeger & Norcliffe, 2009; Kidd & Garcia, 2022), any effort to collect data from understudied languages has the potential to add more knowledge to the field. Eventually, this will allow psycholinguistics and the language sciences to attribute cross-linguistic evidence the role it deserves in furthering our understanding of the cognitive processes underlying language processing (Majid & Levinson, 2010).

Note 1 https://docs.google.com/spreadsheets/d/1AS6NFJad5pqg0gY9g8R8AgGEixTyx0KbGtJ-SLEGG84/ ­ ­ ­ ­­ ​­ ­ edit?usp=sharing for a running database of researchers working on languages underrepresented in psycholinguistic research. It is actively maintained at least as of December 2022, but inclusion is largely self-selected so it is likely to be incomplete.

Further readings Bowern, C. (2008). Linguistic fieldwork: A practical guide. Palgrave Macmillan. Henrich, J., Heine, S. J., & Norenzayan, A. (2010). The weirdest people in the world? Behavioral and Brain Sciences, 33, ­61–135. ​­ Stoll, S. (2015). Studying language acquisition in different linguistic and cultural settings. In N. Bonvillain (Ed.), ­ The routledge handbook of linguistic anthropology (pp. ­­  ­140–158). ​­ Routledge. Wagers, M., & Chung, S. (2019). Language processing experiments in the field. Manuscript. URL: https:// escholarship.org/uc/item/5p5552vk. ­ ­ ­ Whalen, D. H., & McDonough, J. (2015). Taking the laboratory into the field. Annual Review of Linguistics, 1, ­395–415. ​­

166

Experimental research in cross-linguistic psycholinguistics

Related topics Analysing the time course of language comprehension; analysing spoken language comprehension with eye tracking; controlling social factors in experimental linguistics; new directions in statistical analysis for experimental linguistics; experimental methods to study cultural differences in linguistics

References Anand, P., Chung, S., & Wagers, M. (2010). Widening the net: Challenges for gathering linguistic data in the digital age. Response to NSF SBE. Aoshima, S., Phillips, C., & Weinberg, A. (2004). Processing filler-gap dependencies in a head-final language. ​­ Journal of Memory and Language, 51, ­23–54. Arnett, J. J. (2008). The neglected 95%: Why American psychology needs to become less American. Ameri​­ can Psychologist, 63, ­602–614. Barthel, M., & Sauppe, S. (2019). Speech planning at turn transitions in dialog is associated with increased ­ e12768. processing load. Cognitive Science, 43(7), Bates, E., Devescovi, A., & Wulfeck, B. (2001). Psycholinguistics: A cross-language perspective. Annual ​­ Review of Psychology, 52, ­369–396. Bickel, B., & Nichols, J. (2009). Case marking and alignment. In A. L. Malchukov & A. Spencer (Eds.), The ­­  ­304–321). ​­ Oxford handbook of case (pp. Oxford University Press. Bickel, B., Witzlack-Makarevich, A., Choudhary, K. K., Schlesewsky, M., & Bornkessel-Schlesewsky, I. (2015). The neurophysiology of language processing shapes the evolution of grammar: Evidence from ­ e0132819. case marking. PLoS ONE, 10(8), Blasi, D. E., Henrich, J., Adamou, E., Kemmerer, D., & Majid, A. (2022). Over-reliance on English hinders ­ ­1153–1170. ​­ cognitive science. Trends in Cognitive Sciences, 26(12), Bock, K., & Ferreira, V. S. (2014). Syntactically speaking. In M. Goldrick, V. S. Ferreira, & M. Miozzo ­ ­­  ­21–46). ​­ (Eds.), The Oxford handbook of language production (pp. Oxford University Press. Bowern, C. (2008). Linguistic fieldwork: A practical guide. Palgrave Macmillan. Bridges, D., Pitiot, A., MacAskill, M., & Peirce, J. (2020). The timing mega-study: Comparing a range of experiment generators, both lab-based and online. PeerJ, 8, e9414. Brysbaert, M. (2019). How many participants do we have to include in properly powered experiments? A ­ 16. tutorial of power analysis with reference tables. Journal of Cognition, 2(1), Brysbaert, M., & Cortese, M. J. (2011). Do the effects of subjective frequency and age of acquisition survive better word frequency norms?. The Quarterly Journal of Experimental Psychology, 64(3), ­ 545–559. ­ ​­ Carminati, M. N. (2002). The processing of Italian subject pronouns. University of Massachusetts Amherst. Chelliah, S. L., & de Reuse, W. J. (2011). Handbook of descriptive linguistic fieldwork. Springer. Choudhary, K. K., Schlesewsky, M., Roehm, D., & Bornkessel-Schlesewsky, I. (2009). The N400 as a correlate of interpretively relevant linguistic rules: Evidence from Hindi. Neuropsychologia, 47(11), 3012–3022. ­ ­ ​­ Chung, S., Borja, M. F., & Wagers, M. (2012). Bridging methodologies: Experimental syntax in the Pacific. In Linguistic Society of America 86th Annual Meeting. Portland, OR. Clahsen, H. (2016). Contributions of linguistic typology to psycholinguistics. Linguistic Typology, 20(3), ­ 599–614. ­ ​­ Crump, M. J. C., McDonnell, J. V., & Gureckis, T. M. (2013). Evaluating Amazon’s Mechanical Turk as a tool for experimental behavioral research. PLoS ONE, 8(3), ­ e57410. Cutler, A. (1985). ­ ­Cross-language ​­ psycholinguistics. Linguistics, 25(5), ­ ­659–668. ​­ Cutler, A. (2012). Native listening: Language experience and the recognition of spoken words. MIT Press. Dienes, Z., & McLatschie, N. (2018). Four reasons to prefer Bayesian analyses over significance testing. Psychonomic Bulletin & Review, 25(1), ­ 207–218. ­ ​­ Dryer, M. S. (2013a). Expression of pronominal subjects. In M. S. Dryer & M. Haspelmath (Eds.), The world atlas of language structures online. Max Planck Institute for Evolutionary Anthropology. Retrieved from ­ ­ ­ https://wals.info/chapter/101 Dryer, M. S. (2013b). Order of subject, object and verb. In M. S. Dryer & M. Haspelmath (Eds.), The world atlas of language structures online. Max Planck Institute for Evolutionary Anthropology. Retrieved from https://wals.info/chapter/81 ­ ­ ­ Egurtzegi, A., Blasi, D. E., Laka, I., Bornkessel-Schlesewsky, I., Meyer, M., Bickel, B., & Sauppe, S. (2022). Cross-linguistic differences in case marking shape neural power dynamics and gaze behavior during sentence planning. Brain and Language, 230, 105127.

167

Sebastian Sauppe et al. Elsey, J. (2021). Powerful sequential designs using Bayesian estimation: A power analysis tutorial using brms, the tidyverse, and furrr. Preprint from PsyArXiv. https://doi.org/10.31234/osf.io/kt4pz. Evans, N., & Levinson, S. C. (2009). The myth of language universals: Language diversity and its importance for cognitive science. Behavioral and Brain Sciences, 32, 429–492. ­ ​­ Fallon, M., Peelle, J. E., & Wingfield, A. (2006). Spoken sentence processing in young and older adults modulated by task demands: Evidence from self-paced listening. Journal of Gerontology B: Psychological Sciences, 61(1), ­ P10–P17. ­ ​­ Ferreira, F., & Swets, B. (2002). How incremental is language production? Evidence from the production of utterances requiring the computation of arithmetic sums. Journal of Memory and Language, 46, 57–84. ­ ​­ Ferreira, V. S., & Yoshita, H. (2003). Given-new ordering effects on the production of scrambled sentences in Japanese. Journal of Psycholinguistic Research, 32(6), ­ 669–692. ­ ​­ Flecken, M., von Stutterheim, C., & Carroll, M. (2014). Grammatical aspect influences motion event perception: Findings from a cross-linguistic non-verbal recognition task. Language and Cognition, 6(1), ­ 45–78. ­ ​­ Garcia, R., & Kidd, E. (2020). The acquisition of the Tagalog symmetrical voice system: Evidence from structural priming. Language Learning and Development, 16, 399–425. ­ ​­ Garcia, R., Roeser, J., & Kidd, E. (2022). Online data collection to address language sampling bias: Lessons ­ ​­ pandemic. Linguistics Vanguard. https://doi.org/10.1515/lingvan-2021-0040. ­ ­ ­­ ­​­­ ​­ from the COVID-19 Garrido Rodriguez, G., Norcliffe, E., Huettig, F., Brown, P., & Levinson, S. C. (2023). Anticipatory processing in a verb-initial Mayan language: Eye-tracking evidence during sentence comprehension in Tseltal. Cognitive Science, 47(1), ­ e13292, 1-–29. ­ ​­ ​­ van Gijn, E. (2016). Switch reference: An overview. In E. van Gijn & J. Hammond (Eds.), Switch reference 2.0 (pp. John Benjamins. ­­  ­1–54). ​­ Gries, S. T. (2012). Corpus linguistics, theoretical linguistics, and cognitive/psycholinguistics: Towards more and more fruitful exchanges. In J. Mukherjee & M. Huber (Eds.), Corpus linguistics and variation in english: Theory and description (pp. ­­  ­41–63). ​­ Rodopi. Griffin, Z. M., & Bock, K. (2000). What the eyes say about speaking. Psychological Science, 11, ­274–279. ​­ Hellwig, B. (2019). Linguistic diversity, language documentation and psycholinguistics: The role of stimuli. In A. Lahaussois & M. Vuillermet (Eds.), Methodological tools for linguistic description and typology (pp. ­­  ­5–30). ​­ University of Hawai’i Press. Henrich, J., Heine, S. J., & Norenzayan, A. (2010). The weirdest people in the world? Behavioral and Brain Sciences, 33, ­61–83. ​­ Huettig, F., Rommers, J., & Meyer, A. S. (2011). Using the visual world paradigm to study language process­ ​­ ing: A review and critical evaluation. Acta Psychologica, 137, 151–171. Husain, S., Vasishth, S., & Srinivasan, N. (2015). Integration and prediction difficulty in Hindi sentence comprehension: Evidence from an eye-tracking corpus. Journal of Eye Movement Research, 8, 1–12. ­ ​­ Hwang, H., & Kaiser, E. (2014). The role of the verb in grammatical function assignment in English and Korean. Journal of Experimental Psychology: Learning, Memory, and Cognition, 40, 1363. Iggesen, O. A. (2013). Number of cases. In M. S. Dryer & M. Haspelmath (Eds.), The world atlas of language structures online (pp. 1–15). Max Planck Institute for Evolutionary Anthropology. Inoue, A., & Fodor, J. D. (1995). Information-paced parsing of Japanese. In R. Mazuka & N. Nagai (Eds.), Japanese sentence processing (pp. ­­  ­19–48). ​­ Psychology Press. Isasi-Isasmendi, A., Andrews, C., Flecken, M., Laka, I., Daum, M. M., Meyer, M., Bickel, B., & Sauppe, S. (2023). The agent preference in visual event apprehension. Open Mind. doi: https://doi.org/10.1162/opmi_a_00083 Jaeger, T. F., & Norcliffe, E. J. (2009). The cross-linguistic study of sentence production. Language and Linguistics Compass, 3(6), ­ ­866–887. ​­ Kamide, Y., Altmann, G. T. M., & Haywood, S. L. (2003). The time-course of prediction in incremental sentence processing: Evidence from anticipatory eye movements. Journal of Memory and Language, 49(2), ­ ­133–156. ​­ Kidd, E., & Garcia, R. (2022). How diverse is child language acquisition research?. First Language, 42(6), ­ ­703–735. ​­ Kieslich, P. J., Henninger, F., Wulff, D. U., Haslbeck, J. M. B., & Schulte-Mecklenbeck, M. (2019). Mousetracking: A practical guide to implementation and analysis. In M. Schulte-Mecklenbeck, A. Kuehberger, & J. G. Johnson (Eds.), A handbook of process tracing methods (2nd ­ ed., ­pp. ­111–130). ​­ Routledge. Koizumi, M., Takeshima, Y., Tachibana, R., Asaoka, R., Saito, G., Niikuni, K., & Gyoba, J. (2020). Cognitive loads and time courses related to word order preference in Kaqchikel sentence production: An NIRS and ­eye-tracking ​­ study. Language, Cognition and Neuroscience, 35(2), ­ ­137–150. ​­

168

Experimental research in cross-linguistic psycholinguistics Konieczny, L., Hemforth, B., Scheepers, C., & Strube, G. (1997). The role of lexical heads in parsing: Evi­ ­307–348. ​­ dence from German. Language and Cognitive Processes, 12(3), Konopka, A. E. (2019). Encoding actions and verbs: Tracking the time-course of relational encoding during message and sentence formulation. Journal of Experimental Psychology: Learning, Memory, and Cogni­ ­1486–1510. ​­ tion, 45(6), Kruschke, J. K., & Liddell, T. M. (2018). The Bayesian new statistics: Hypothesis testing, estimation, ­ meta-analysis, and power analysis from a Bayesian perspective. Psychonomic Bulletin & Review, 25(1), ­178–206. ​­ Lakens, D. (2014). Performing high-powered studies efficiently with sequential analyses. European Journal ­ ­701–710. ​­ of Social Psychology, 44(5), Levy, R. P., & Keller, F. (2013). Expectation and locality effects in German verb-final structures. Journal of ​­ Memory and Language, 68, ­199–222. Lloyd-Fox, S., Papademetriou, M., Darboe, M. K., Everdell, N. L., Wegmuller, R., Prentice, A. M., Moore, S. E., & Elwell, C. E. (2014). Functional near infrared spectroscopy (fNIRS) to assess cognitive function in infants in rural Africa. Scientific Reports, 4, 4740. MacWhinney, B., & Bates, E. (1989). The crosslinguistic study of sentence processing. Cambridge University Press. Majid, A., & Levinson, S. C. (2010). WEIRD languages have misled us, too. Behavioral and Brain Sciences, 33, 193. Mani, N., Schreiner, M. S., Brase, J., Köhler, K., Strassen, K., Postin, D., & Schultze, T. (2021). Sequential Bayes Factor designs in developmental research: Studies on early word learning. Developmental Science, ­ e13097. 24(5), de Marneffe, M.-C., Manning, C. D., Nivre, J., & Zeman, D. (2021). Universal dependencies. Computational ­ ­255–308. ​­ Linguistics, 47(2), Matar, S., Pylkkänen, L., & Marantz, A. (2019). Left occipital and right frontal involvement in syntactic category prediction: MEG evidence from Standard Arabic. Neuropsychologia, 135, 107230. Meyer, A. S. (1996). Lexical access in phrase and sentence production: Results from picture-word interfer­ ­477–496. ​­ ence experiments. Journal of Memory and Language, 35(4), Minagawa, Y., & Cristia, A. (2019). Shedding light on language function and its development with optical brain imaging. In G. I. de Zubicaray & N. O. Schiller (Eds.), The Oxford handbook of neurolinguistics ­­  ­154–185). ​­ (pp. Oxford University Press. Mishra, R. K., Singh, N., Pandey, A., & Huettig, F. (2012). Spoken language-mediated anticipatory eyemovements are modulated by reading ability: Evidence from Indian low and high literates. Journal of Eye ­ ­1–10. ​­ Movement Research, 5(3), Mitsugi, S. (2017). Incremental comprehension of Japanese passives: Evidence from the visual-world para­ ­953–983. ​­ digm. Applied Psycholinguistics, 38(5), Momma, S., & Ferreira, V. S. (2019). Beyond linear order: The role of argument structure in speaking. Cognitive Psychology, 114, 101228. Momma, S., Slevc, L. R., & Phillips, C. (2016). The timing of verb selection in Japanese sentence production. ­ ­813–824. ​­ Journal of Experimental Psychology: Learning, Memory, and Cognition, 42(5), Mulak, K. E., Sarvasy, H. S., Tuninetti, A., & Escudero, P. (2021). Word learning in the field: Adapting a ­ laboratory-based task for testing in remote Papua New Guinea. PLoS ONE, 16(e0257393). Myachykov, A., Scheepers, C., Garrod, S., Thompson, D., & Fedorova, O. (2013). Syntactic flexibility and competition in sentence production: The case of English and Russian. The Quarterly Journal of Experi­ ­1601–1619. ​­ mental Psychology, 66(9), Næss, Å. (2015). The Äiwoo verb phrase: Syntactic ergativity without pivots. Journal of Linguistics, 51, ­75–106. ​­ Næss, Å. (2017). Documenting Äiwoo. Endangered Languages Archive. Retrieved from http://hdl.handle. ­ ­­ ­​­­ ­​­­ ­​­­ ­​­­ ​­ net/2196/00-0000-0000-000F-BF44-A Næss, Å. (2021). Voice and valency morphology in Äiwoo. Oceanic Linguistics, 60, 160–198. ­ ​­ Næss, Å., & Sauppe, S. (under review). Bringing psycholinguistics to the field: Experiences from Solomon Islands. Nakano, Y., Felser, C., & Clahsen, H. (2002). Antecedent priming at trace positions in Japanese long-distance scrambling. Journal of Psycholinguistic Research, 31(6), ­ 531–571. ­ ​­ Nicenboim, B., & Vasishth, S. (2016). Statistical methods for linguistic research: Foundational ideas — part II. Language and Linguistics Compass, 10(11), 591–613. ­ ­ ​­

169

Sebastian Sauppe et al. Nielsen, M., Haun, D. B. M., Kärtner, J., & Legare, C. H. (2017). The persistent sampling bias in developmental psychology: A call to action. Journal of Experimental Child Psychology, 162, ­31–38. ​­ Norcliffe, E., Harris, A. C., & Jaeger, T. F. (2015a). Cross-linguistic psycholinguistics and its critical role in theory development: Early beginnings and recent advances. Language, Cognition and Neuroscience, 30(9), ­1009–1032. ​­ Norcliffe, E., & Konopka, A. E. (2015). Vision and language in cross-linguistic research on sentence production. In R. K. Mishra, N. Srinivasan, & F. Huettig (Eds.), Attention and vision in language processing (pp. Springer. ­­  ­77–96). ​­ Norcliffe, E., Konopka, A. E., Brown, P., & Levinson, S. C. (2015b). Word order affects the time course of sentence formulation in Tzeltal. Language, Cognition and Neuroscience, 30(11), ­ ­1187–1208. ​­ Nordlinger, R., Garrido Rodriguez, G., & Kidd, E. (2022). Sentence planning and production in Murrinhpatha, an Australian ‘free word order’ language. Language, 98(1), ­ ­187–220. ​­ Pinti, P., Tachtsidis, I., Hamilton, A., Hirsch, J., Aichelburg, C., Gilbert, S., & Burgess, P. W. (2020). The present and future use of functional near-infrared spectroscopy (fNIRS) for cognitive neuroscience. Annals of the New York Academy of Sciences, 1464(1), ­ ­5–29. ​­ Pizarro-Guevara, J. S., & Wagers, M. (2020). The predictive value of Tagalog voice morphology in filler-gap dependency formation. Frontiers in Psychology, 11, 517. Rice, K. (2006). Ethical issues in linguistic fieldwork: An overview. Journal of Academic Ethics, 4(1), ­ ­123–155. ​­ Rubio-Fernandez, P., & Jara-Ettinger, J. (2020). Incrementality and efficiency shape pragmatics across languages. Proceedings of the National Academy of Sciences of the United States of America, 117(24), ­ ­13399–13404. ​­ Runner, J. T., & Ibarra, A. (2016). Information structure effects on null and overt subject comprehension in Spanish. In A. Holler & K. Suckow (Eds.), Empirical perspectives on anaphora resolution (pp. ­­  ­87–112). ​­ De Gruyter. Saah, K. K. (1995). Studies in Akan syntax, acquisition, and sentence processing. (Doctoral dissertation). University of Ottawa, Canada. Sakarias, M., & Flecken, M. (2019). Keeping the result in sight and mind: General cognitive principles and language-specific influences in the perception and memory of resultative events. Cognitive Science, 43(9), ­ e12708. Saleem, S. (2010). Argument optionality: A new library for the Grammar Matrix customization system. (Doctoral dissertation). University of Washington. Sarvasy, H. S., Morgan, A. M., Yu, J., Ferreira, V. S., & Momma, S. (2022). Cross-clause planning in Nungon ­ ​­ (Papua New Guinea): Eye-tracking evidence. Memory & Cognition. Memory & Cognition, 1–15. Sarvasy, H. S. (2014). A grammar of Nungon: A Papuan language of the Morobe province, Papua New Guinea. (Doctoral dissertation). James Cook University. Sauppe, S. (2016). Verbal semantics drives early anticipatory eye movements during the comprehension of ­verb-initial ​­ sentences. Frontiers in Psychology, 7. 95 Sauppe, S. (2017a). Symmetrical and asymmetrical voice systems and processing load: Pupillometric evi­ 288–313. ­ ​­ dence from sentence production in Tagalog and German. Language, 93(1), Sauppe, S. (2017b). Word order and voice influence the timing of verb planning in German sentence production. Frontiers in Psychology, 8, 1648. Sauppe, S., Choudhary, K. K., Giroud, N., Blasi, D. E., Norcliffe, E., Bhattamishra, S., Gulati, M., Egurtzegi, A., Bornkessel-Schlesewsky, I., Meyer, M., & Bickel, B. (2021). Neural signatures of syntactic variation in speech planning. PLOS Biology, 19(8), ­ e3001038. Sauppe, S., Næss, Å., Roversi, G., Meyer, M., Bornkessel-Schlesewsky, I., & Bickel, B. (under revision). An agent-first preference in a patient-first language during sentence comprehension. Sauppe, S., Norcliffe, E. J., Konopka, A. E., Van Valin, R. D., & Levinson, S. C. (2013). Dependencies first: Eye tracking evidence from sentence production in Tagalog. In Proceedings of the 35th Annual Conference of the Cognitive Science Society (pp. ­­  ­1265–1270). ​­ Cognitive Science Society. Sauter, M., Draschkow, D., & Mack, W. (2020). Building, hosting and recruiting: A brief introduction to running behavioral experiments online. Brain Sciences, 10, 251. Schönbrodt, F. D., Wagenmakers, E. J., Zehetleitner, M., & Perugini, M. (2017). Sequential hypothesis testing with Bayes factors: Efficiently testing mean differences. Psychological Methods, 22, 322–339. ­ ​­ van de Schoot, R., Depaoli, S., King, R., Kramer, B., Märtens, K., Tadesse, M. G., Vannucci, M., Gelman, A., Veen, D., Willemsen, J., & Yau, C. (2021). Bayesian statistics and modelling. Nature Reviews Methods Primers, 1, 1.

170

Experimental research in cross-linguistic psycholinguistics Schriefers, H., Teruel, E., & Meinshausen, R. M. (1998). Producing simple sentences: Results from pictureword interference experiments. Journal of Memory and Language, 39, 609–632. ­ ​­ Seifart, F., Evans, N., Hammarström, H., & Levinson, S. C. (2018). Language documentation twenty-five years on. Language, 94, e324–e345. ­ ​­ Sekerina, I. A. (1997). The syntax and processing of scrambling constructions in Russian. City University of New York. Sekerina, I. A. (2003). Scrambling and processing: Dependencies, complexity, and constraints. In S. Karimi (Ed.), Word Order and Scrambling (pp. John Wiley  & Sons. ­ ­­  ­301–324). ​­ Smith, M., & Wheeldon, L. (1999). High level processing scope in spoken sentence production. Cognition, 73, ­205–246. ​­ Smith, P. L., & Little, D. R. (2018). Small is beautiful: In defense of the small-N design. Psychonomic Bulletin & Review, 25, 2083–2101. ­ ​­ Speed, L., Wnuk, E., & Majid, A. (2017). Studying psycholinguistics out of the lab. In A. M. B. de Groot & P. Hagoort (Eds.), Research methods in psycholinguistics and the neurobiology of language: A practical guide (pp. ­­  ­190–207). ​­ ­Wiley-Blackwell. ​­ Spivey, M. J., & Dale, R. (2006). Continuous dynamics in real-time cognition. Current Directions in Psycho­ ­207–211. ​­ logical Science, 15(5), Stewart, N., Chandler, J., & Paolacci, G. (2017). Crowdsourcing samples in cognitive science. Trends in Cognitive Sciences, 21(12), ­ ­736–748. ​­ Stoll, S. (2009). Crosslinguistic approaches to language acquisition. In E. L. Bavin (Ed.), The Cambridge ­­  ­89–104). ​­ handbook of child language (pp. Cambridge University Press. Stoll, S., & Schikowski, R. (2020). Child-language corpora. In M. Paquot & S. T. Gries (Eds.), A practical handbook of corpus linguistics (pp. ­­  ­305–327). ​­ Springer. Suzuki, T. (2013). Children’s on-line processing of scrambling in Japanese. Journal of Psycholinguistic ­ ­119–137. ​­ Research, 42(2), Tamaoka, K., Sakai, H., Kawahara, J., Miyaoka, Y., Lim, H., & Koizumi, M. (2005). Priority information used for the processing of Japanese sentences: Thematic roles, case particles or grammatical functions?. Journal of Psycholinguistic Research, 34(3), ­ ­281–332. ​­ Thieberger, N. (Ed.). (2012). The Oxford handbook of linguistic fieldwork. Oxford University Press. Thompson, G. L., & Desrochers, A. (2009). Corroborating biased indicators: Global and local agreement among objective and subjective estimates of printed word frequency. Behavior Research Methods, 41(2), ­ ­452–471. ​­ Traxler, M. J., Morris, R. K., & Seely, R. E. (2002). Processing subject and object relative clauses: Evidence ­ ­69–90. ​­ from eye movements. Journal of Memory and Language, 47(1), Vasishth, S., Mertzen, D., Jäger, L. A., & Gelman, A. (2018). The statistical significance filter leads to overoptimistic expectations of replicability. Journal of Memory and Language, 103, ­151–175. ​­ van de Velde, M., Meyer, A. S., & Konopka, A. E. (2014). Message formulation and structural assembly: Describing “easy” and “hard” events with preferred and dispreferred syntactic structures. Journal of Memory and Language, 71, ­124–144. ​­ Vitevitch, M. S., & Luce, P. A. (2016). Phonological neighborhood effects in spoken word perception and production. Annual Review of Linguistics, 2, ­75–94. ​­ Vos, M., Minor, S., & Ramchand, G. C. (2022). Comparing infrared and webcam eye tracking in the Visual World Paradigm. Glossa: Psycholinguistics, 1(1), ­ ­1–37. ​­ Wagers, M., Borja, M. F., & Chung, S. (2015). The real-time comprehension of WH-dependencies in a WHagreement language. Language, 91(1), ­ ­ ­109–144. ​­ Wagers, M., & Chung, S. (2019). Language processing experiments in the field. Manuscript. Retrieved from https://escholarship.org/uc/item/5p5552vk ­ ­ ­ ­ Wagers, M. W., Borja, M. F., & Chung, S. (2018). Grammatical licensing and relative clause parsing in a flexible ­word-order language. Cognition, 178, ­207–221. ​­ ​­ Waters, G. S., & Caplan, D. (2004). Verbal working memory and on-line syntactic processing: Evidence from listening. The Quarterly Journal of Experimental Psychology Section A: Human Experimental ­self-paced ​­ Psychology, 57(1), ­ ­129–163. ​­ Waters, G. S., Caplan, D., & Yampolsky, S. (2003). On-line syntactic processing under concurrent memory load. Psychonomic Bulletin & Review, 10(1), ­ ­88–95. ​­ Whalen, D. H., & McDonough, J. (2015). Taking the laboratory into the field. Annual Review of Linguistics, 1, ­395–415. ​­

171

Sebastian Sauppe et al. Yamashita, H. (1997). The effects of word-order and case marking information on the processing of Japanese. ­ ­163–188. ​­ Journal of Psycholinguistic Research, 26(2), Yang, X., & Krajbich, I. (2021). Webcam-based online eye-tracking for behavioral research. Judgement and ­ ­1485–1505. ​­ Decision Making, 16(6), Yasunaga, D., Yano, M., Yasugi, Y., & Koizumi, M. (2015). Is the subject-before-object preference universal? An event-related potential study in the Kaqchikel Mayan language. Language, Cognition and Neurosci­ ­1209–1229. ​­ ence, 30(11),

172

11 EXPERIMENTAL METHODS TO STUDY DISTRIBUTED COMPREHENSION OF ACTION LANGUAGE Carol Madden-Lombardi and William Dupont

11.1

Introduction and historical perspective

Understanding the human capacity to comprehend language has occupied researchers for decades, even centuries. Throughout the 20th century, research on “verbal learning” and memory for text gave rise to descriptive theories often invoking language modules that processed amodal symbols, separate from other cognitive and perceptual-motor systems. Even in the late 20th century, language was thought to be constrained to brain regions such as Broca’s and Wernicke’s areas (Fodor, 1983; Pylyshyn, 1984; Mahon & Caramazza, 2008). Over the past few decades, the fields of cognitive psychology and cognitive neuroscience have seen a shift away from symbolic, amodal cognitive systems and towards embodied theories of cognition that explicitly integrate the body and its perceptual-motor systems into conceptual models (see Barsalou, 1999). This idea has quickly taken root in the study of language comprehension, whereby the incoming words and sentences are hypothesised to partially reactivate previous traces of experience (see Zwaan & Madden, 2005). According to this embodied framework of language comprehension, language representations are instantiated in the same systems that are used for perception and action. This shift in theoretical framework has in turn affected the way researchers study language comprehension in several ways, such as (1) the type of questions that are addressed, (2) the type of language that is studied, and (3) the types of methodologies used. We will briefly discuss the first two topics before delving into our main discussion of the types of methodologies that are employed within the distributed framework to investigate language comprehension, with a particular focus on action language.

11.2 11.2.1

Critical issues and concepts The type of questions addressed

First and foremost, if language and other cognitive systems indeed share representational resources with perceptual-motor systems, it follows that sensorimotor processing may affect language processing and vice versa. By sensorimotor processing, we refer to both behaviours and neural 173

DOI: 10.4324/9781003392972-13

Carol ­Madden-­Lombardi and William Dupont

processes that pertain to sensory and motor function. This represents one of the fundamental questions to be addressed as theories of embodied language comprehension gained momentum, leading to numerous studies investigating the facilitative and/or detrimental effects of perceptual-motor systems on language comprehension and vice versa. Early behavioural studies indeed yielded effects of perception and action on language comprehension and vice versa (e.g., Klatzky et al., 1989; Solomon & Barsalou, 2001; Glenberg & Kaschak, 2002; Zwaan et al., 2002; Kaschak et al., 2005; Madden & Zwaan, 2006; Zwaan & Taylor, 2006). For example, in Glenberg and Kaschak’s (2002) study, participants were asked to assess the sensibility of sentences such as “Close the drawer” by moving their hand to a button that was close to or far from their body. The movement direction to press the button and the direction described in the sentence could be compatible or not. Performance was facilitated (shorter reaction times) when the direction of movement within the sentences was compatible with the direction of the motor responses. It is also noteworthy that the direction of the effect (facilitative or detrimental) depends on the latency between the processing of the action language and the motor response (Chersi et al., 2010). Thus, the evidence was mounting that language representations were indeed “embodied” or “distributed”, such that the meanings of words are processed not only in traditional language areas of the brain but also in the specific sensorimotor regions that are relevant to them. For example, a sentence about leaves rustling might engage auditory areas that process the sound of the leaves as well as visual areas that process what the leaves look like. In this sense, language representations may share (at least in part) the same systems of representation as perception and action and can thus affect and be affected by perception and action. Embodied or distributed language representations are often referred to as “simulations” as they simulate our real perceptual-motor experience with the described concepts. As these early behavioural studies lead to questions about how these distributed systems were instantiated in the brain, they were quickly followed up by imaging studies that confirmed activation of neurological systems that are typically used for perception and action during the processing of language (Pulvermüller, 1999, 2005; Pulvermüller et al., 2001; Hauk et al., 2004; Jirak et al., 2010; Pulvermüller & Fadiga, 2010; Planton & Démonet, 2012; Courson & Tremblay, 2020). For example, Hauk and colleagues (2004) found corresponding somatotopic activation in the primary motor and premotor areas when readers processed sentences about hand (e.g., pick), face (e.g., lick) or foot movements (e.g., kick). As evidence mounted for distributed activation in brain areas that were once thought to remain outside the realm of language processing, researchers began to accept the idea that reading or hearing certain types of language engaged distributed sensorimotor representations. However, another question quickly took hold of the language comprehension community and remains an issue under debate, namely the role of this distributed activation during language comprehension. Indeed, a main criticism of the embodied or distributed perspective is that the nature, content, and neurological underpinnings of these distributed language representations remain underspecified, as does the precise nature of the interaction between these representations and the representations acquired through direct experience (Patterson et al., 2007; Mahon & Caramazza, 2008; Lambon Ralph et al., 2009, 2010; Mahon, 2015, 2020). Is distributed activation automatic? Is it essential for comprehension? Is it merely epiphenomenal? These are some of the questions arising from the distributed language comprehension framework. We will discuss several methodologies that attempt to address these questions in the third section, but first we will discuss how this new framework has influenced the kind of language that is used for investigation.

174

Methods to study distributed comprehension of action language

11.2.2

The type of language studied

As discussed above, the idea that language shares representational resources with perceptualmotor systems evokes questions concerning the influence of sensorimotor processing on language processing and vice versa. To address such questions, investigations would use language that is pertinent to such systems, namely concrete concepts that can activate sensorimotor representations. Consequently, the emergence of the embodied theory has led to an onslaught of research using language that refers to perceptual and motor concepts (Barsalou, 1999, 2008; Pulvermüller, 1999, 2005; Pulvermüller et al., 2001; Zwaan et al., 2002; Barsalou et al., 2003; Zwaan & Yaxley, 2004; Gallese & Lakoff, 2005; Clark, 2006; Connell, 2007; Fischer & Zwaan, 2008; Pecher et al., 2009; Pulvermüller & Fadiga, 2010; Richter & Zwaan, 2010; Gallese & Sinigaglia, 2011; Glenberg & Gallese, 2012; Kiefer & Pulvermüller, 2012; Zwaan & Pecher, 2012; Rommers et al., 2013; Van Weelden et al., 2014; de Koning et al., 2017; Hoeben Mannaert et al., 2017, 2020), focusing heavily on the visual aspects (but also auditory and other sensory domains) as well as the motor aspects of word meanings. For example, in Pecher and colleagues’ (2009) study, participants first read a list of sentences in which objects were mentioned and then pictures were presented in a surprise recognition memory task. The researchers were careful to describe the objects in a particular form or orientation. Recognition performance was better if the orientation or shape of the object in the test picture matched that implied by the preceding sentence, suggesting that details of sensorimotor simulations during reading are retained over a period of time. Before the arrival of the idea of embodied cognition, researchers in the field had already been investigating the content of language representations for decades, giving rise to various theories of language comprehension (i.e., Johnson-Laird, 1983; van Dijk & Kintsch, 1983; Gernsbacher, 1990; for a review before embodied ideas took hold, see Zwaan & Radvansky, 1998). However, comprehension theories that were put forth before the arrival of embodied cognition typically focused on language and memory systems, ignoring possible contributions from perceptual and motor systems. With the arrival of the new distributed framework, investigations were pushed to focus on the specific content that could engage sensorimotor processing in measurable ways. Initial work investigated whether visual features such as shape, orientation, or colour of concepts in words and sentences would prime the actual visual processing of such features and vice versa. For instance, Zwaan and colleagues employed sentences implying certain perceptual constraints, such as shape (Zwaan et al., 2002; Madden & Zwaan, 2006) and orientation of objects (Stanfield & Zwaan, 2001), demonstrating that comprehenders were faster to verify pictures when they matched the perceptual (visual) constraints in the preceding sentence. Similar to these results in the visual domain, research on motor performance was demonstrating how processing words and sentences about particular actions could prime movements that were compatible with the described actions, using action sentences such as “Close the drawer” in the study by Glenberg & Kaschak (2002) described above. Action language was also shown to activate motor areas of the brain in imaging studies. For instance, Hauk and colleagues’ (2004) carefully selected 150 action verbs that implied movements of the face, hand, and foot to selectively engage corresponding somatotopic activation in the primary motor and premotor areas. Such experiments are examples of the shift in language materials that was driven by the new framework. And so, with the arrival of the distributed framework, many investigations on language comprehension took a new focus on the sensory and motor features of words and sentences. This sensorimotor language has been implemented in a variety of methodologies, which will now be discussed in Section 11.3, focusing on the link between language and the motor system.

175

Carol ­Madden-­Lombardi and William Dupont

11.3

Main research methods

Frameworks of embodied or distributed language comprehension postulate a relation between perceptual-motor and language processes (Barsalou, 1999, 2008; Barsalou et al., 2003; Gallese & Lakoff, 2005; Decety & Grèzes, 2006; Jeannerod, 2006; Zwaan & Taylor, 2006; Fischer & Zwaan, 2008; Glenberg & Gallese, 2012). Testing these ideas requires creative techniques that probe both perceptual-motor as well as traditional language systems. Below we will highlight several interesting methodologies that have been employed to this end, stemming from the domains of behavioural, neuroimaging, and neurophysiological research within the domain of action language.

11.3.1

Behavioural methods

Behavioural methods have played an early and central role in demonstrating the link between language and motor systems. Specifically, a range of behavioural methodologies has been proposed to demonstrate the influence of action language on actual movements. Indeed, reading about actions has been shown to affect motor performance, as demonstrated in behavioural measures such as response time (Glenberg & Kaschak, 2002; Taylor & Zwaan, 2008; Andres et al., 2015; Klepp et al., 2019), reaching and grasping movements (Gentilucci & Gangitano, 1998; Gentilucci et al., 2000; Glover & Dixon, 2002; Glover et al., 2004) as well as strength during complex (Rabahi et al., 2012, 2013) and isometric movements (Frak et al., 2010; Nazir et al., 2017; Da Silva et al., 2018). In terms of movement-based response times, Taylor and Zwaan (2008) employed a creative methodology in which participants turned a knob to make words of action sentences appear, such that the faster the reader turned the knob, the faster the words of the sentence appeared on the screen. They demonstrated that when the direction of knob-turning matched the direction of the action described in the sentence (turning clockwise, “turned up the volume”), participants read/ turned the knob faster compared to when the knob-turning and the action described in the sentence were in opposite directions. The reading of action verbs has also been shown to influence reaching and grasping movements (Gentilucci et al., 2000; Gentilucci, 2002, 2003; Glover & Dixon, 2002; Glover et al., 2004). For instance, Gentilucci et al. (2000) asked participants to reach out and grasp an object imprinted with words such as “near/far” or “small/large”. The kinematics (e.g., direction, velocity) of the reaching and grasping was affected by the meaning of the printed words, and the facilitation was only present if the linguistic content was compatible with the motor task and presented at a specific moment (Boulenger et al., 2006; Dalla Volta et al., 2009; Chersi et al., 2010). This is another example of Glenberg and Kaschak’s (2002) action compatibility effect, which has been demonstrated many times since (Gentilucci et al., 2000; Taylor & Zwaan, 2008; Dalla Volta et al., 2009; Glenberg & Gallese, 2012; Zwaan et al., 2012; Van Dam & Desai, 2017). Such effects of language on motor performance have also been shown using complex movements such as a squat vertical jump, whereby subsequent processing of a specific action verb (e.g., passive reading of the word “jump”) can improve subjects’ performance. Another innovative behavioural methodology demonstrating the influence of language on human movement is mouse-tracking, in which the x, y coordinates of continuous goal-directed mouse movements are recorded, allowing assessments of the cognitive availability/attractiveness of information present on the display, for instance, multiple options for a click response. This was first employed in language processing by Spivey and colleagues (2005), who had participants listen to words and move the mouse to the corresponding image on the screen. The shape of the mouse trajectories revealed sensitivity to acoustic–phonetic input and competition between lexical representations on a fine temporal scale. 176

Methods to study distributed comprehension of action language

Behavioural methods have also been employed to investigate the link between movement and action language comprehension in the reverse direction, that is, the influence of movement on the comprehension of action language. Indeed, the comprehension of action language has been shown to be influenced by the repetition of movements during motor training (Glenberg et al., 2008; Locatelli et al., 2012; Trevisan et al., 2017). Locatelli et al. (2012), for example, employed a programme of manual action training over 3 weeks. Before and after a training period on origami folding, participants performed a sentence-picture judgement task in which they were asked to judge whether sentence-image pairs about actions were semantically congruent. A significant improvement was observed in performance on the semantic processing task after training. This suggests that gains in sensorimotor expertise during manual training may lead to improved semantic processing during action reading, demonstrating the functional link between language and motor systems. In a little over two decades, behavioural methodologies have indeed made a clear case for the idea that language processing and motor performance were able to influence each other. Indeed, behavioural studies suggest that understanding an action word seems to immediately and automatically activate a representation of the action to which it refers. However, more information would be necessary to make the case for overlapping systems between language and action. Methodologies from neuroscience would serve a key role in uncovering the large distributed network involved in the representation of action language. To investigate the effects of action language on the human body, and more precisely on the motor system, imaging techniques such as functional Magnetic Resonance Imagery (fMRI) and Positron Emission Tomography (PET) have been used, as well as electrical activity recordings of the brain such as electroencephalography (EEG) and magnetoencephalography (MEG). These methodologies will be discussed below, before turning to a neurophysiological technique that has recently been applied to the topic of action language, namely transcranial magnetic stimulation.

11.3.2

Neuroimaging methods

Neuroimaging methods, such as fMRI and PET provide information on the level of blood oxygenation circulating in the brain (employing radioactive markers for PET), allowing indirect identification of the cerebral networks that are engaged during processing. While temporal resolution is very low (the haemodynamic response takes several seconds after neural activity occurs in a brain area), the spatial resolution of these neuroimaging techniques is quite high. Indeed, imaging studies have demonstrated that the processing of action language generally engages a large distributed network encompassing fronto-parietal and temporo-occipital cortical regions (Hauk et al., 2004; Aziz-Zadeh et al., 2006; Jirak et al., 2010; Planton and Démonet, 2012; Courson and Tremblay, 2020). As mentioned above, Hauk et al. (2004) employed fMRI to assess brain regions activated while participants read verbs implicating hand, face, or foot movements, reporting somatotopic activation in the premotor and primary motor cortices. Within a couple of years this somatotopic activation was re-confirmed, during the reading of action words related to the hand, foot, and mouth (Aziz-Zadeh et al., 2006), and during listening to sentences involving movements of the mouth, hand, and leg (Tettamanti et al., 2005). EEG and MEG are also non-invasive techniques to explore cortical activity, measuring at the scalp the very small amounts of electrical activity (EEG) or magnetic field (MEG) produced by the brain’s electrical currents. In contrast to the imaging methodologies described above, both EEG and MEG boast high-temporal resolution, but lower-spatial resolution. They have been used extensively to record electrocortical activity of participants while they were engaged in reading 177

Carol ­Madden-­Lombardi and William Dupont

about actions (Pulvermüller & Mohr, 1996; Pulvermuller et al., 1996; Pulvermüller et al., 1999, 2001; Kellenbach et al., 2002; Hauk et al., 2006; Beres, 2017). For example, Pulvermüller and colleagues (1996) had participants read concrete nouns or action verbs while measuring EEG. While concrete nouns elicited stronger responses in visual cortices in the occipital lobes, action verbs elicited stronger activity in motor cortex recording sites. Several years later, Pulvermüller and colleagues (2001) employed a reading task with action verbs referring to the arm (to press), the leg (to kick), or the mouth (to bite), observing somatotopic activity in the motor cortex and the adjacent frontal cortex. More precisely, leg-related verbs produced the strongest inward currents near the cortical leg representation, whereas for face-related verbs, the strongest inward activity was observed near the articulatory muscles representation. Given the high-temporal resolution of EEG, these researchers were able to observe these changes in brain activity from 250 ms after the word appeared. Pulvermüller and colleagues (2005) subsequently employed high-density MEG while participants listened to words referring to face or leg actions. They once again observed cortical somatotopy of the motor actions signified by the words in temporal and frontocentral cortices as early as ~150 ms.

11.3.3

Transcranial magnetic stimulation

In contrast to these neuroimaging techniques that measure activity generated by the brain itself, transcranial magnetic stimulation (TMS) delivers an electromagnetic field through a coil placed over the head, inducing a stimulation of the brain area below. For instance, when the coil is placed over the hand area of the primary motor cortex, we can induce an electric current that stimulates the neurons in this region, and this induced activity can then be measured as an electromyographic response in hand muscles (see Figure 11.1). With a single TMS pulse, we can measure a motor-evoked potential (MEP) which is a marker of cortico-spinal excitability at the time of stimulation. Moreover, the development of pairedpulse TMS protocols allows us to assess the contribution of inhibitory and facilitative intracortical mechanisms (Chen, 2004; Rothwell et al., 2009). In this approach, the application of a first “conditioning” stimulation over the primary motor cortex will modulate the response of a second “test” stimulation several milliseconds later. The ability to quantify inhibitory and facilitative mechanisms and the high-temporal resolution of this neurophysiological tool make it very useful in investigating motor system involvement during action language processing (Pulvermüller et al., 2005; Tomasino et al., 2008; Papeo et al., 2009, 2013, 2015; Labruna et al., 2011; Innocenti et al., 2014; Papitto et al., 2021; Dupont et al., 2022). For example, Dupont and colleagues (2022) had participants silently read manual action verbs or non-manual verbs, and single TMS pulses were delivered shortly after the verb presentation. They observed an increase in corticospinal excitability after reading manual action verbs compared to non-action verbs, suggesting the engagement of the hand area of the motor system during the processing of manual action verbs (see also: Papeo et al., 2009, 2013, 2015, 2016; Labruna et al., 2011; Innocenti et al., 2014; although Buccino et al., 2005 observed a decrease after listening to action sentences). Using TMS in a different way (offline low-frequency repetitive TMS), Vitale and colleagues (2021) were able to temporarily disrupt the inferior frontal gyrus (inhibitor area) during the reading of affirmative and negative action sentences. In their study, without TMS stimulation, affirmative action verbs lead to a greater corticospinal excitability than their negative counterparts, which normally evoke inhibition from the inferior frontal gyrus. However, when this inhibition area is disturbed by TMS, no difference between the two types of sentences is observed, suggesting that these researchers effectively suppressed the inhibitory effect of negation via TMS. Such studies 178

Methods to study distributed comprehension of action language

­Figure  11.1

Schematic illustration of transcranial magnetic stimulation. When applied to the primary motor cortex, TMS induces an activation of pyramidal neurons. This potential will then pass through the corticospinal pathway and reach the targeted contralateral muscle via the spinal cord and the nerves. Via an electromyographic recording, the peak-to-peak amplitude of the evoked motor potential allows quantification of the corticospinal excitability (Dupont, 2022).

demonstrate the usefulness of the TMS methodology in language research, not only to study the intersection of language and motor processing but also to shed light on various mechanisms of linguistic processes, such as negation.

11.4

Recommendations for practice

Thanks to a growing ensemble of available methodologies, researchers have compiled significant evidence for the idea that processing action language engages a large distributed network including fronto-parietal and temporo-occipital regions as well as subcortical structures (Pulvermüller, 1999, 2005; Pulvermüller & Fadiga, 2010). We have highlighted various methodologies that have been instrumental in demonstrating the engagement of the motor system during the comprehension of words and sentences about actions. Nevertheless, the question remains as to whether this activation of the motor cortex is necessary to understand action language (Wurm & Caramazza, 2019). Researchers have observed that disturbances of motor areas caused by repeated TMS during language processing can yield a decrease in performance (Willems et al., 2011; Repetto et al., 2013; Vukovic et al., 2017; Courson et al., 2018). This suggests that the motor system could play a role in understanding action language, perhaps in the optimisation of this process, refining linguistic understanding and making it more effective when necessary (Dupont et al., 2022). However, while comprehension might indeed be diminished or less effective when motor areas are artificially disrupted through TMS, it is not completely blocked; comprehension is still able to 179

Carol ­Madden-­Lombardi and William Dupont

proceed without the involvement of the primary motor cortex. This question concerning the role of distributed activation during language comprehension remains unanswered, and future research will be essential in shedding light on this issue. The continued development (and combination) of creative techniques to probe the motor and language systems will eventually reveal the mechanisms that underlie their interaction. As most available methodologies exhibit both strengths and weaknesses, cross-methodological work will be key to addressing such difficult questions as the role of the motor system in understanding action language.

11.5

Limitations and future directions

There exist several limitations to the state of the art of current investigations on language and the motor system. For instance, investigations on distributed language are always constrained by our ability to measure differences in specific sensory or motor domains. It is for this reason that so many studies have focused on concrete concepts that can engage visual, auditory, motor, or other sensorimotor representations (typically one at a time). While certain advances in abstract concepts have been made within the framework of distributed language comprehension (see Borghi et al., 2017 for a review), the majority of the evidence in favour of distributed language processing clearly stems from concrete concepts, and further research is required in the domain of abstract concepts. In addition to the biased focus on concrete concepts, the constraint to measure differences in specific sensory or motor domains also gives rise to a bias for unidimensional representations. Indeed, it is difficult to probe multiple modalities at the same time, especially with response-based behavioural paradigms, and also with TMS, where a specific region is stimulated at a given time. That is, a MEP at the hand muscle upon TMS stimulation over the hand area of the motor cortex can only tell us about the manual-motor aspect of a language representation, regardless of the richness of the sentence stimuli presented (emotional, visual, auditory, etc.). This may not be representative of our language simulations, which are likely distributed over multiple modalities, yet we can often only measure one modality at a time. The investigation of multi-modal language representations is an important issue for future research, and cross-modal and cross-methodological investigations will likely lead the way in uncovering the mechanisms of distributed language comprehension.

Further reading Arrington, C. N., Ossowski, A. E., Baig, H., Persichetti, E., Morris, R. (2022). The impact of transcranial mag­ ­255–277. ​­ netic stimulation on reading processes: A systematic review. Neuropsychology Review, 33(1), Chen, R. (2004). Interactions between inhibitory and excitatory circuits in the human motor cortex. Experi­ ­1–10. ​­ mental Brain Research, 154(1), Di Lazzaro, V., Rothwell, J., Capogna, M. (2018). Noninvasive stimulation of the human brain: Activation of ­ multiple cortical circuits. Neuroscientist. doi:10.1177/1073858417717660. Kobayashi, M., Pascual-Leone, A. (2003). Transcranial magnetic stimulation in neurology. Lancet Neurology. ­­ ​­ ­ ­­ ​­ doi:10.1016/S1474–4422(03)00321-1. Papeo, L., Pascual-Leone, A., Caramazza, A. (2013). Disrupting the brain to validate hypotheses on the neu­ robiology of language. Frontiers in Human Neuroscience. doi:10.3389/fnhum.2013.00148 Rothwell, J. C. (1997). Techniques and mechanisms of action of transcranial stimulation of the human motor ­ ​­ ­­ ​­ ­ ­­ ​­ cortex. Journal of Neuroscience Methods, 74:113–122. doi:10.1016/S0165–0270(97)02242-5. Rothwell, J. C., Day, B. L., Thompson, P. D., Kujirai, T. 2009. Short latency intracortical inhibition: One of ­ the most popular tools in human motor neurophysiology. Journal of Physiology. doi:10.1113/jphysiol. 2008.162461.

180

Methods to study distributed comprehension of action language

Related topics New directions in statistical analysis for experimental linguistics; analysing language using brain imaging; contrasting online and offline measures: examples from experimental research on linguistic relativity; controlling social factors in experimental linguistics

References Andres, M., Finocchiaro, C., Buiatti, M., & Piazza, M. (2015). Contribution of motor representations to action verb processing. Cognition, 134, ­174–184. ​­ Aziz-Zadeh, L., Wilson, S. M., Rizzolatti, G., & Iacoboni, M. (2006). Congruent embodied representations for visually presented actions and linguistic phrases describing actions. Current Biology, 16(17), 1818–1823. ­ ­ ​­ Barsalou, L. W. (1999). Perceptual symbol systems. Behavioral and Brain Sciences, 22(4), ­ 577–609. ­ ​­ Barsalou, L. W. (2008). Grounded cognition. Annual Review of Psychology, 59, ­617–645. ​­ Barsalou, L. W., Simmons, W. K., Barbey, A. K., & Wilson, C. D. (2003). Grounding conceptual knowledge in modality-specific systems. Trends in Cognitive Sciences, 7(2), ­ ­84–91. ​­ Beres, A. M. (2017). Time is of the essence: A review of electroencephalography (EEG) and event-related brain potentials (ERPs) in language research. Applied Psychophysiology Biofeedback, 42(3), ­ 247–255 ­ ​­ Borghi, A. M., Binkofski, F., Castelfranchi, C., Cimatti, F., Scorolli, C., & Tummolini, L. (2017). The challenge of abstract concepts. Psychological Bulletin, 143(3), ­ ­263–292. ​­ Boulenger, V., Roy, A. C., Paulignan, Y., Deprez, V., Jeannerod, M., & Nazir, T. A. (2006). Cross-talk between language processes and overt motor behavior in the first 200 msec of processing. Journal of Cognitive Neuroscience, 18(11), ­ ­1607–1615. ​­ Buccino, G., Riggio, L., Melli, G., Binkofski, F., Gallese, V., & Rizzolatti, G. (2005). Listening to actionrelated sentences modulates the activity of the motor system: A combined TMS and behavioral study. Cognitive Brain Research, 24(2), ­ ­355–363. ​­ Chen, R. (2004). Interactions between inhibitory and excitatory circuits in the human motor cortex. Experimental Brain Research, 156(3), ­ ­512–519. ​­ Chersi, F., Thill, S., Ziemke, T., & Borghi, A. M. (2010). Sentence processing: Linking language to motor chains. Frontiers in Neurorobotics, 4, 4. https://www.frontiersin.org/articles/10.3389/fnbot.2010.00004/full ­ ­ ­ ­ ­ Clark, A. (2006). Language, embodiment, and the cognitive niche. Trends in Cognitive Sciences, 10(8), ­ ­370–374. ​­ Connell, L. (2007). Representing object colour in language comprehension. Cognition, 102(3), ­ 476–485. ­ ​­ Courson, M., Macoir, J., & Tremblay, P. (2018). A facilitating role for the primary motor cortex in action sentence processing. Behavioural Brain Research, 336, ­244–249. ​­ Courson, M., & Tremblay, P. (2020). Neural correlates of manual action language: Comparative review, metaanalysis and ROI analysis. Neuroscience & Biobehavioral Reviews, 120, ­1–28. ​­ Dalla Volta, R., Gianelli, C., Campione, G. C., & Gentilucci, M. (2009). Action word understanding and overt motor behavior. Experimental Brain Research, 196(4), ­ ­403–412. ​­ Da Silva, R. L., Labrecque, D., Caromano, F. A., Higgins, J., & Frak, V. (2018). Manual action verbs modulate the grip force of each hand in unimanual or symmetrical bimanual tasks. PLoS One, 13(5), ­ e0192320. Decety, J., & Grèzes, J. (2006). The power of simulation: Imagining one’s own and other’s behavior. Brain Research, 1079(1), ­ 4–14. ­ ​­ de Koning, B. B., Wassenburg, S. I., Bos, L. T., & Van der Schoot, M. (2017). Size Does Matter: Implied Object Size is Mentally Simulated During Language Comprehension. Discourse Processes, 54(7), ­ ­493–503. ​­ Dupont, W. (2022). Le sens des mots d’action dans le cortex moteur. Retrieved from http://www.theses. fr/s228057 ­ Dupont, W., Papaxanthis, C., Lebon, F., & Madden-Lombardi, C. (2022). Does the motor cortex want the full story? The influence of sentence context on corticospinal excitability in action language processing. Neuroscience, 506, ­58–67. ​­ Fischer, M. H., & Zwaan, R. A. (2008). Embodied language: A review of the role of the motor system in language comprehension. Quarterly Journal of Experimental Psychology, 61(8), ­ ­825–850. ​­ Fodor, J. A. (1983). The Modularity of Mind. MIT Press. Frak, V., Nazir, T., Goyette, M., Cohen, H., & Jeannerod, M. (2010). Grip force is part of the semantic representation of manual action verbs. PLoS One, 5(12), e9728. ­

181

Carol ­Madden-­Lombardi and William Dupont Gallese, V., & Lakoff, G. (2005). The brain’s concepts: The role of the sensory-motor system in conceptual ­­ ​­ ­455–479. ​­ knowledge. Cognitive Neuropsychology, 22(4–6), Gallese, V., & Sinigaglia, C. (2011). What is so special about embodied simulation? Trends in Cognitive Sci­ ­512–519. ​­ ences, 15(10), ­ Gentilucci, M. (2002). Object motor representation and reaching-grasping control. Neuropsychologia, 40(9), ­1139–1153. ​­ ­ Gentilucci, M. (2003). Object motor representation and language. Experimental Brain Research, 150(2), ­260–265. ​­ Gentilucci, M., Benuzzi, F., Bertolani, L., Daprati, E., & Gangitano, M. (2000). Language and motor control. ­ ­468–490. ​­ Experimental Brain Research, 133(3), Gentilucci, M., & Gangitano, M. (1998). Influence of automatic word reading on motor control. European ­ ­752–756. ​­ Journal of Neuroscience, 10(5), Gernsbacher, M. A. (1990). Language Comprehension as Structure Building. Lawrence Erlbaum Associates. Glenberg, A. M., & Gallese, V. (2012). Action-based language: A theory of language acquisition, comprehen­ ­905–922. ​­ sion, and production. Cortex, 48(8), Glenberg, A. M., & Kaschak, M. P. (2002). Grounding language in action. Psychonomic Bulletin & Review, ­ ­558–565. ​­ 9(3), Glenberg, A. M., Sato, M., & Cattaneo, L. (2008). Use-induced motor plasticity affects the processing of ­ ­R290-R291. ​­ abstract and concrete language. Current Biology, 18(7), Glover, S., & Dixon, P. (2002). Semantics affect the planning but not control of grasping. Experimental Brain ​­ Research, 146, ­383–387. Glover, S., Rosenbaum, D. A., Graham, J., & Dixon, P. (2004). Grasping the meaning of words. Experimental ­ ­103–108. ​­ Brain Research, 154(1), Hauk, O., Davis, M. H., Ford, M., Pulvermüller, F., & Marslen-Wilson, W. D. (2006). The time course of visual ­ ­1383–1400. ​­ word recognition as revealed by linear regression analysis of ERP data. NeuroImage, 30(4), Hauk, O., Johnsrude, I., & Pulvermüller, F. (2004). Somatotopic representation of action words in human ­ ­301–307. ​­ motor and premotor cortex. Neuron, 41(2), Hoeben Mannaert, L. N., Dijkstra, K., & Zwaan, R. A. (2017). Is color an integral part of a rich mental simula­ ­974–982. ​­ tion? Memory and Cognition, 45(6), Hoeben Mannaert, L. N., Dijkstra, K., & Zwaan, R. A. (2020). Is color continuously activated in mental simu­ ­127–147. ​­ lations across a broader discourse context? Memory & Cognition, 49(1), Innocenti, A., De Stefani, E., Sestito, M., & Gentilucci, M. (2014). Understanding of action-related and ab­ ­85–92. ​­ stract verbs in comparison: A behavioral and TMS study. Cognitive Processing, 15(1), Jeannerod, M. (2006). Motor Cognition: What Actions Tell the Self (Vol. 42). Oxford University Press. Jirak, D., Menz, M. M., Buccino, G., Borghi, A. M., & Binkofski, F. (2010). Grasping language - A short story ­ ­711–720. ​­ on embodiment. Consciousness and Cognition, 19(3), ­Johnson-Laird, ​­ P. N. 1983. Mental Models : Towards a Cognitive Science of Language, Inference, and Consciousness. Harvard University Press. Kaschak, M. P., Madden, C. J., Therriault, D. J., Yaxley, R. H., Aveyard, M., Blanchard, A. A., & Zwaan, R. ­ ­B79-B89. ​­ A. (2005). Perception of motion affects language processing. Cognition, 94(3), Kellenbach, M. L., Wijers, A. A., Hovius, M., Mulder, J., & Mulder, G. (2002). Neural differentiation of lexico-syntactic categories or semantic features? Event-related potential evidence for both. Journal of ­ ­561–577. ​­ Cognitive Neuroscience, 14(4), Kiefer, M., & Pulvermüller, F. (2012). Conceptual representations in mind and brain: Theoretical develop­ ­805–825. ​­ ments, current evidence and future directions. Cortex, 48(6), Klatzky, R. L., Pellegrino, J. W., McCloskey, B. P., & Doherty, S. (1989). Can you squeeze a tomato? The ­ role of motor representations in semantic sensibility judgments. Journal of Memory and Language, 28(1), ­56–77. ​­ Klepp, A., van Dijk, H., Niccolai, V., Schnitzler, A., & Biermann-Ruben, K. (2019). Action verb processing spe­ 15985. cifically modulates motor behaviour and sensorimotor neuronal oscillations. Scientific Reports, 9(1), Labruna, L., Fernández-Del-Olmo, M., Landau, A., Duqué, J., & Ivry, R. B. (2011). Modulation of the motor ­ ­243–250. ​­ system during visual and auditory language processing. Experimental Brain Research, 211(3), Lambon Ralph, M. A., Pobric, G., & Jefferies, E. (2009). Conceptual knowledge is underpinned by the tem­ ­832–838. ​­ poral pole bilaterally: Convergent evidence from rTMS. Cerebral Cortex, 19(4), Lambon Ralph, M. A., Sage, K., Jones, R. W., & Mayberry, E. J. (2010). Coherent concepts are computed in ­ ­2717–2722. ​­ the anterior temporal lobes. Proceedings of the National Academy of Sciences, 107(5),

182

Methods to study distributed comprehension of action language Locatelli, M., Gatti, R., & Tettamanti, M. (2012). Training of manual actions improves language understanding of semantically related action sentences. Frontiers in Psychology, 3, 547. Madden, C. J., & Zwaan, R. A. (2006). Perceptual representation as a mechanism of lexical ambiguity resolution: An investigation of span and processing time. Journal of Experimental Psychology: Learning, ­ ­1291–1303. ​­ Memory, and Cognition, 32(6), Mahon, B. Z. (2015). What is embodied about cognition? Language, Cognition and Neuroscience, 30(5), ­ ­420–429. ​­ Mahon, B. Z. (2020). Brain Mapping: Understanding the Ins and Outs of Brain Regions. Current Biology, ­ ­R414–R416. ​­ 30(6), Mahon, B. Z., & Caramazza, A. (2008). A critical look at the embodied cognition hypothesis and a new pro­­ ​­ ­59–70. ​­ posal for grounding conceptual content. Journal of Physiology Paris, 102(1–3), Nazir, T. A., Hrycyk, L., Moreau, Q., Frak, V., Cheylus, A., Ott, L.,... & Delevoye-Turrell, Y. (2017). A simple technique to study embodied language processes: The grip force sensor. Behavior Research Methods, 49(1), ­ ­61–73. ​­ Papeo, L., Hochmann, J. R., & Battelli, L. (2016). The default computation of negated meanings. Journal of Cognitive Neuroscience, 28(11), ­ ­1980–1986. ​­ Papeo, L., Lingnau, A., Agosta, S., Pascual-Leone, A., Battelli, L., & Caramazza, A. (2015). The origin of ­word-related ​­ ­ ­1668–1675. ​­ motor activity. Cerebral Cortex, 25(5), Papeo, L., Pascual-Leone, A., & Caramazza, A. (2013). Disrupting the brain to validate hypotheses on the neurobiology of language. Frontiers in Human Neuroscience, 7, 148. Papeo, L., Vallesi, A., Isaja, A., & Rumiati, R. I. (2009). Effects of TMS on different stages of motor and non­ e4508. motor verb processing in the primary motor cortex. PloS One, 4(4), Papitto, G., Lugli, L., Borghi, A. M., Pellicano, A., Binkofski, F., & P, G. L. L. (2021). Embodied negation and levels of concreteness: A TMS study on German and Italian language processing. Brain Research, 1767, 147523. Patterson, K., Nestor, P. J., & Rogers, T. T. (2007). Where do you know what you know? The representation of semantic knowledge in the human brain. Nature Reviews Neuroscience, 8, ­976–987. ​­ Pecher, D., Van Dantzig, S., Zwaan, R. A., & Zeelenberg, R. (2009). Language comprehenders retain implied ​­ shape and orientation of objects. Quarterly Journal of Experimental Psychology, 62, ­1108–1114. Planton, S., & Démonet, J. F. (2012). Neurophysiologie du langage: Apports de la neuro-imagerie et état des connaissances. Revue de Neuropsychologie, 4(4), 255–266. Pulvermüller, F. (1999). Words in the brain’s language. Behavioral and Brain Sciences, 22(2), ­ ­253–279. ​­ Pulvermüller, F. (2005). Brain mechanisms linking language and action. Nature Reviews Neuroscience, 11(5), ­ ­351–360. ​­ Pulvermüller, F., & Fadiga, L. (2010). Active perception: Sensorimotor circuits as a cortical basis for language. Nature Reviews Neuroscience, 11(5), ­ 351–360. ­ ​­ Pulvermüller, F., Härle, M., & Hummel, F. (2001). Walking or talking?: Behavioral and neurophysiological correlates of action verb processing. Brain and Language, 78, ­143–168. ​­ Pulvermüller, F., Hauk, O., Nikulin, V. V., & Ilmoniemi, R. J. (2005). Functional links between motor and language systems. European Journal of Neuroscience, 21, 793–797. ­ ​­ Pulvermüller, F., Lutzenberger, W., & Preissl, H. (1999). Nouns and verbs in the intact brain: Evidence from ­event-related ​­ potentials and ­high-frequency ​­ cortical responses. Cerebral Cortex, 9(5), ­ 497–506. ­ ​­ Pulvermüller, F., & Mohr, B. (1996). The concept of transcortical cell assemblies: A key to the understanding of cortical lateralization and interhemispheric interaction. Neuroscience and Biobehavioral Reviews, 20(6), ­ 557–566. ­ ​­ Pulvermuller, F., Preissl, H., Lutzenberger, W., & Birbaumer, N. (1996). Brain rhythms of language: Nouns versus verbs. European Journal of Neuroscience, 8(8), ­ ­937–941. ​­ Pulvermüller, F., Shtyrov, Y., & Ilmoniemi, R. (2005). Brain signatures of meaning access in action word recognition. Journal of Cognitive Neuroscience, 17(5), ­ ­884–892. ​­ Pylyshyn, Z. W. (1984). Computation and Cognition: Toward a Foundation for Cognitive Science. Cambridge, MA: MIT Press.Rabahi, T., Fargier, P., Rifai Sarraj, A., Clouzeau, C., & Massarelli, R. (2013). Effect of action verbs on the performance of a complex movement. PLoS One, 8(8), ­ e68687. Rabahi, T., Sarraj, A. R., Fargier, P., Clouzeau, C., & Massarelli, R. (2012). Action verb and motor performance. Kinesitherapie, 12(125), ­ ­42–46. ​­ Repetto, C., Colombo, B., Cipresso, P., & Riva, G. (2013). The effects of rTMS over the primary motor cortex: The link between action and language. Neuropsychologia, 51(1), ­ 8–13. ­ ​­

183

Carol ­Madden-­Lombardi and William Dupont Richter, T., & Zwaan, R. A. (2010). Integration of perceptual information in word access. Quarterly Journal of Experimental Psychology, 63(1), ­ ­81–107. ​­ Rommers, J., Meyer, A. S., & Huettig, F. (2013). Object shape and orientation do not routinely influence performance during language processing. Psychological Science, 24(10), ­ ­2218–2225. ​­ Rothwell, J. C., Day, B. L., Thompson, P. D., & Kujirai, T. (2009). Short latency intracortical inhibition: One of the most popular tools in human motor neurophysiology. Journal of Physiology, 587(6), ­ ­11–12. ​­ Solomon, K. O., & Barsalou, L. W. (2001). Representing properties locally. Cognitive Psychology, 43(2), ­ ­129–163. ​­ Spivey, M. J., Grosjean, M., & Knoblich, G. (2005). Continuous attraction toward phonological competitors. Proceedings of the National Academy of Sciences of the United States of America, 102(28), ­ ­10393–10398. ​­ Stanfield, R. A., & Zwaan, R. A. (2001). The effect of implied orientation derived from verbal context on picture recognition. Psychological Science, 12(2), ­ ­153–156. ​­ Taylor, L. J., & Zwaan, R. A. (2008). Motor resonance and linguistic focus. Quarterly Journal of Experimental Psychology, 61(7), ­ ­896–904. ​­ Tettamanti, M., Buccino, G., Saccuman, M. C., Gallese, V., Danna, M., Scifo, P., Fazio, F., Rizzolatti, G., Cappa, S. F., & Perani, D. (2005). Listening to action-related sentences activates fronto-parietal motor circuits. Journal of Cognitive Neuroscience, 17(2), ­ ­273–281. ​­ Tomasino, B., Fink, G. R., Sparing, R., Dafotakis, M., & Weiss, P. H. (2008). Action verbs and the primary motor cortex: A comparative TMS study of silent reading, frequency judgments, and motor imagery. Neuropsychologia, 46(8), ­ ­1915–1926. ​­ Trevisan, P., Sedeño, L., Birba, A., Ibáñez, A., & García, A. M. (2017). A moving story: Whole-body motor training selectively improves the appraisal of action meanings in naturalistic narratives. Scientific Reports, 7(1), ­ ­1–10. ​­ Van Dam, W. O., & Desai, R. H. (2017). Embodied simulations are modulated by sentential perspective. Cognitive Science, 41(7), ­ ­1613–1628. ​­ van Dijk, T. A., & Kintsch, W. (1983). Strategies of discourse comprehension. Language, 59(2), ­ ­313–344. ​­ Van Weelden, L., Schilperoord, J., & Maes, A. (2014). Evidence for the role of shape in mental representations of similes. Cognitive Science, 38(2), ­ ­303–321. ​­ Vitale, F., Monti, I., Padrón, I., Avenanti, A., & de Vega, M. (2021). The neural inhibition network is causally involved in the disembodiment effect of linguistic negation. Cortex, 147, ­72–82. ​­ Vukovic, N., Feurra, M., Shpektor, A., Myachykov, A., & Shtyrov, Y. (2017). Primary motor cortex functionally contributes to language comprehension: An online rTMS study. Neuropsychologia, 96, ­222–229. ​­ Willems, R. M., Labruna, L., D’Esposito, M., Ivry, R., & Casasanto, D. (2011). A functional role for the motor system in language understanding: Evidence from theta-burst transcranial magnetic stimulation. Psychological Science, 22(6), ­ ­849–854. ​­ Wurm, M. F., & Caramazza, A. (2019). Distinct roles of temporal and frontoparietal cortex in representing actions across vision and language. Nature Communications, 10(1), ­ ­1–12. ​­ Zwaan, R. A., & Madden, C. J. (2005). Embodied sentence comprehension. In D. Pecher & R. A. Zwaan (Eds.), Grounding Cognition: The Role of Perception and Action in Memory, Language, and Thinking ­ (pp. Cambridge University Press. ­­  ­224–245). ​­ Zwaan, R. A., & Pecher, D. (2012). Revisiting mental simulation in language comprehension: Six replication attempts. PLoS One, 7(9), ­ e45296. Zwaan, R. A., & Radvansky, G. A. (1998). Situation models in language comprehension and memory. Psychological Bulletin, 123(2), ­ ­162–185. ​­ Zwaan, R. A., Stanfield, R. A., & Yaxley, R. H. (2002). Language comprehenders mentally represent the shapes of objects. Psychological Science, 13(2), ­ ­168–171. ​­ Zwaan, R. A., Taylor, L. J. (2006). Seeing, acting, understanding: Motor resonance in language comprehension. Journal of Experimental Psychology: General, 135(1), ­ ­1–11. ​­ Zwaan, R. A., van der Stoep, N., Guadalupe, T., & Bouwmeester, S. (2012). Language comprehension in the balance: The robustness of the action-compatibility effect (ACE). PLoS ONE, 7(9), ­ e31204. Zwaan, R. A., & Yaxley, R. H. (2004). Lateralization of object-shape information in semantic processing. Cognition, 92(3), ­ ­B23–B30. ​­

184

PART II

Focus on experimental methods

12 ELICITING SPONTANEOUS LINGUISTIC PRODUCTIONS Jennifer Arnold

12.1

Introduction

There is one human skill that is shared by all communities worldwide: the ability to communicate by speaking. This cognitive achievement is remarkable. Speakers must manage information in multiple ways, and very rapidly. For example, imagine that I walk into the kitchen and see that you baked a cake, and I say: “Wow look at that cake!” This simple sentence required me to visually recognize the cake and activate the word for it (“cake”). I had to decide what to say about the cake, considering the social context. For example, my excited comment might signal that I’m hoping for a slice of the cake. I also had to choose words that are appropriate for the context. For example, I couldn’t say “look at it” as my first sentence, although I could say “look at that”, perhaps pointing to the cake. I had to put the words in order, following the grammatical rules of the language. I had to create a phonological plan for the utterance, and then program the motor movements to produce the sentence. How do humans do this? Language scholars have examined this question in numerous ways. Some studies analyse corpora of naturally produced language (e.g., Bell et al., 2009; Gahl, 2008). This chapter instead focuses on experimental research and reviews methods used to study the cognitive mechanisms used for language production. Much of the work in this area has taken an ­information-processing approach, aiming to understand what representations are involved ​­ in language production. One major challenge to studying language production is that the input to the production event is an idea. That is, people communicate because they want to say something about an idea, and this triggers the production of linguistic units like spoken words, manual signs, or written language. But while it is easy to record the produced language, it is hard to experimentally observe an idea. Below we will review different approaches to addressing this challenge. Another challenge is that communication typically takes place in a social context. This means that speakers make linguistic choices based on multiple overlapping goals, for example, both the need to communicate information as well as social goals such as being polite. Such social goals also require representations, and some studies seek to understand these representations and processes by experimentally constraining the social context. Language takes place in several modalities. The basic form of communication is spoken language, which includes both orally produced languages (e.g., Russian, Vietnamese) and manual 187

DOI: 10.4324/9781003392972-15

Jennifer Arnold

(signed) languages like American Sign Language or Nicaraguan Sign Language. Signers use manual signs that have a similar linguistic structure as do words in spoken languages, and similar cognitive processes apply in both cases. This chapter focuses on spoken language, including manual languages. These contrast with written language, which has very different information-processing properties than spoken language use. Experimental studies typically focus on one piece of the production process at a time to identify the representations and mechanisms that are active for that particular process. Each section below describes the research questions and methods used to study production processes for (a) word retrieval, (b) sentence formulation, (c) language in a social context, and (d) language in a discourse context. This chapter examines research that seeks to understand language as a human ability, focusing on the mature language production system in adult speakers. It is assumed that all humans share a basic cognitive architecture that guides language production. Fully answering this question requires testing models across numerous languages and populations. Nevertheless, much of the work reviewed here focuses on English and other Western languages. This is not because these languages offer an especially ideal testing domain, but rather because Western languages have been studied more frequently in this literature.

12.2 Word production One of the basic units of language is words. Adult speakers typically know over 27,000 words (Brysbaert et al., 2016). These words are stored in memory and retrieved during language planning. Knowing a word consists of knowing that such a word exists in your language, its meaning, and its form (e.g., the sounds or sign for it). Much research on word production focuses on content words, which are the words that carry meaning (nouns, verbs, adjectives, etc.). Function words (e.g., determiners “a” or “the”, prepositions “to” or “with”) instead play more of a role during syntactic formulation, when speakers decide on the structure of their utterances. One key source of evidence about how language works is the kind of mistakes people make. For example, you might select the wrong word, saying “chair” instead of “table”. This would be an example of a semantic error because you selected a word that was meaningfully related to the intended word. Or you might mis-pronounce a word, saying “lemon” instead of “melon” (which also happens to result in a semantically related error; see Dell & Reich, 1981). Experimentalists have developed several techniques for systematically eliciting speech errors. One common experimental technique is the use of tongue twisters, which are particularly good for testing phonological errors. For example, Li et al. (2017) examined how word production is influenced by bilingualism, and in particular the claim that bilinguals may be subtly disadvantaged in specific language production processes (Bialystok, 2009). For example, Gollan and Goldrick (2012) found that even highly proficient bilinguals made more sound-based errors in a tongue twister task. Li et al. (2017) tested two possible sources for this disadvantage. The competition hypothesis suggests that it stems from increased competition between words for bilinguals, given that both languages may be activated during the task. By contrast, the frequency lag account suggests that it stems from the fact that they speak each of their languages less frequently than a monolingual does, so all words in that language are essentially lower frequency words than they are for monolinguals. Li et al. tested these hypotheses with three participant groups: Mandarin-English bilinguals, Spanish-English bilinguals, and English monolinguals. People repeated nonsense word tongue twisters in one of two conditions. In the overlapping condition, the nonsense words used syllables that occur in all three languages (e.g., tuni, puni, puni, and tuni), whereas in the nonoverlapping 188

Eliciting spontaneous linguistic productions

condition, the nonsense words were only word-like in English (e.g., stave, spaev, spaev, and staev); for example, the clusters “st” and “sp” don’t occur in Mandarin or Spanish. Results showed that while monolinguals were equally accurate in the two conditions, bilinguals made many more errors in the nonoverlapping condition. If the disadvantage stems from greater competition, they should have had specific difficulty in the condition where the sequences were consistent with both of their languages, but this is not what they found. They concluded that instead, it may stem from the lower frequency of all words in each language, or with difficulty acquiring two distinct phonological systems. Tongue twisters provide a straightforward and adaptable research technique. For example, they have been used to show that attention to a particular word decreases errors in that word while increasing errors in surrounding words (Nozari & Dell, 2012). Other techniques examine the effect of semantic and lexical constraints on word production. In the “SLIP” technique (Baars et al., 1975), participants silently read pairs of words, e.g., “barred dorm”, and “bought dog”. They then are cued to read aloud a pair like “darn bore”; the sequence of preceding words leads to interference with the target word, resulting in about 30% errors (“barn door”). This technique has been used to show that errors are more frequent when the sequence results in real words (darn bore -> barn door) than not (dart board -> bart doard), and that errors are more frequent when the resulting words are neutral (tool carts) than when they are taboo (tool kits; Motley et al., 1981; 1982). Instead of examining errors, another approach is to test the conditions that facilitate the fast production of words. The assumption is that in speeded tasks, faster responses reflect faster cognitive processing. In the context of language production, researchers test the facilitation or inhibition of planning processes prior to articulation. One such technique is the Picture-Word Interference task, or PWI task (Schriefers et al., 1990). This was developed to test the hypothesis that word production involves activating a nonlinguistic message, which is then used to select the right word (Levelt, 1989). This selection was hypothesized to occur in two stages, first selecting a word based on its semantics, and then building the phonological representation (e.g., Dell & Reich, 1981). Schriefers et al. tested Dutch-speaking participants and asked them to name line drawings of common things, e.g., a desk (“bureau” in Dutch). They also presented interfering auditory stimuli to the participants, which consisted of words that were either phonologically or semantically related to the target (e.g., for the target “bureau” the semantic competitor was “kast” [closet], while the phonological competitor was “buurman” [neighbor]); these were compared to a neutral interfering stimulus (e.g., “muts” [cap]). Critically, they also varied the timing of the interfering stimulus, which occurred either immediately before the target, at the same time as, or immediately after. They found that semantic interference slowed the production of the target word, but only when the competitor occurred before the target. By contrast, phonological interference sped up target production, but only when presented later in the trial. This supports the hypothesis that semantic and phonological processes during word production are separate and ordered.

12.3

Sentence formulation

A core question about language production is how people put together sentences. To test this, researchers need to both allow participants some freedom in how they produce a sentence but also constrain the sentence enough so that productions are comparable. One frequent approach is to ask participants to describe simple pictures with a single sentence. For example, Bock (1986) asked participants to describe pictures of simple events that could be described in more than one way. For example, a picture of lightening hitting a church could be 189

Jennifer Arnold

described as “The church was hit by lightening” (passive) or “Lightening hit the church” (active). She used this task to test whether speakers tended to repeat structures they had recently heard. She organized the task so that the target trials were preceded by “priming” trials where participants heard a sentence and repeated it. These primes exposed participants to either a passive structure (The referee was punched by one of the fans) or an active structure (One of the fans punched the referee). She found that participants were more likely to use an active structure following an active prime than a passive prime. This effect occurs even though the sentence and words are different, showing that people store the abstract syntactic structure. This effect has been replicated numerous times (Mahowald et al., 2016; Pickering & Ferreira, 2008). Priming also provides a tool for testing the categorization of syntactic structures. For example, Ziegler and Snedeker (2018) used priming to test whether destination and recipient roles are categorized together but found that they do not prime each other, suggesting separate representations. Priming also affects structural choices in signed languages. Hall et al. (2015) demonstrated this by testing American Sign Language. Participants described pictures of coloured objects. For example, a green bird can be described with signs in two orders, either “green bird” or “bird green” (which is analogous to saying “the bird that is green”). These description tasks were preceded by videos of a signer describing objects with either the colour-first or colour-second structure. Participants were more likely to use the colour-first structure when they had just seen a colour-first prime. This priming effect occurred for deaf native signers, deaf non-native signers, and hearing second-language signers (see Cleland & Pickering, 2003, for a similar effect in English). Picture description has also been combined with eye-tracking methods. Griffin and Bock (2000) used eye-tracking to identify the planning processes that precede speech. Participants described simple pictures like “The mailman is being chased by the dog, The dog is chasing the mailman or The mailman is chasing the dog”. They found that participants initially inspected the picture without favouring any region. They then shifted to an incremental planning process, where they fixated each character roughly 900 ms before they mentioned it. This pattern occurred both for active and passive utterances, so it was not driven by the semantic role or the animacy of the referent. Eye-tracking is useful because it provides insight into planning processing. It has been used to demonstrate, for example, that speakers spend less time fixating objects that were previously mentioned, and less time fixating objects that they will refer to with a pronoun (which also signals that the object is probably given; see Section 12.5; van der Meulen et al., 2001).

12.4

Production in a social context

While language use is a complex cognitive behaviour, it is also critically a behaviour that is used for a social purpose – to communicate. A key observation about language production is that it cannot be simply described as a job for the speaker. Instead, language is typically part of what Herb Clark calls a “joint action” (Clark, 1996), the result of the speaker and one or more interlocutors working together to achieve a goal. This raises questions about the goals and representations that are necessary to produce language in complex situations, which requires using interactive language elicitation tasks. For example, speakers must choose words that will be interpretable to their addressee, given the current visual and informational context, and given what information they share with their addressee. Researchers ask questions about how these processes unfold. For example, how does the visual and informational context constrain production? To what extent do speakers consider the knowledge of their addressees? How do speakers respond to feedback? These questions require researchers to elicit productions within a controlled social context. Here we review some of the experimental techniques that have been used to do this. 190

Eliciting spontaneous linguistic productions

One dimension of language is that speakers need to decide how much information to provide. Do I say: “the large plate” or just “the plate”? What drives this choice? Several studies have used a referential communication task to examine how speakers design referring expressions. Typically, this sort of study asks people to refer to an object in a scene, where the other objects in the scene are manipulated to test how properties of the context influence the speaker’s choices. For example, Graf et al. (2016) had people describe pictures (e.g., a dalmatian) in the context of two other pictures. The context pictures determined whether the target picture could be successfully identified with a superordinate term (animal) or the basic-level term (dog), or if the subordinate term (dalmatian) was necessary. They found a big effect of the context, for example, the subordinate term was most used when it was necessary, e.g., when there was another dog in the context. But this was not the only constraint; people also favoured shorter expressions, and those that were more typical for the target picture. Similarly, multiple studies have shown that people tend to use modifiers (e.g., the big plate) in the presence of a contrasting object (e.g., Brown-Schmidt & Tanenhaus, 2006; Frank & Goodman, 2012), although they sometimes provide too much information (e.g., Englehardt et al., 2006; Koolen et al., 2011; Pogue et al., 2016). Referential communication tasks have also been used to examine how children develop the ability to refer in contextually appropriate ways. For example, Deutsch and Pechmann (1982) asked Dutch-speaking adults and children to describe objects on a table to an experimenter. The objects were similar to each other (e.g., a big green ball, a small green ball, a big blue ball, a small blue ball, a big blue candle, a small blue candle, a big red candle, a small red candle). Adults nearly always (94%) produced unambiguous expressions on the first try, whereas children’s success varied by age (78% for ­9-year-olds, 50% for ­6-year-olds, and 13% for ­3-year-olds). ­ ­​­­ ​­ ­​­­ ​­ ­​­­ ​­ This kind of task is useful because experimenters can manipulate the types of competing objects in the context to identify which ones influence speakers’ decisions. For example, Ferreira et al. (2005) asked participants to describe pictures of objects that included a target ambiguous object, e.g., a bat (the animal type). The context either included no contrasting object, another conceptually similar image, e.g., a smaller animal-bat, or an object that was conceptually similar but linguistically similar, e.g., a baseball bat. They found that speakers were more likely to modify in the context of a conceptually similar competitor than in the contrast of a linguistically similar/conceptually different competitor. They concluded that speakers may not notice the linguistic similarity, since doing so would require them to generate the names of competitor objects. One major question in this area is whether speakers design their speech with their addressee’s needs in mind. This process is known as audience design. For example, speakers might say “the big plate” only when there is another plate of a different size specifically because they know that this will facilitate comprehension. Alternatively, they may select modifiers automatically when there is a contrasting object in the context, ignoring whether this information is actually useful to the addressee. A typical approach to testing this is to manipulate whether the contrasting object is only known to the speaker, which is called “privileged ground”, or has been established as mutual knowledge, known as “common ground” (Clark & Marshall, 1981). For example, Nadig and Sedivy (2002, exp. 1) had 5–6-year-old children and adults sit on either side of a shelf with four compartments, while a confederate sat on the other side. Three of these compartments displayed objects that both people could see (common ground), but one was blocked off so that only the participant could see the object inside it (privileged ground). Participants were signalled to mention one object, for example, a tall glass. The critical question was whether they would describe it using a modified expression, e.g., “the tall glass” when another glass was present in the display. When the target was the only glass in the display, children almost never used modifiers and adults never did. If there was a second glass in common ground, adults always produced 191

Jennifer Arnold

a modifier, and children frequently did (75% of trials). The interesting condition was the one where the second glass was present, but in a privileged condition. If speakers produce modifiers strictly when they are useful to their addressee, they should only do so when the contrast is informative to the addressee. But instead, both adults and children produced modifiers in this condition on about half the trials. This was less than in the condition where both people could see the competitor object, showing that both children and adults are sensitive to the distinction between common ground and privileged ground. But it also shows that producing a modifier isn’t driven entirely by the addressee’s knowledge. Speakers are also influenced by their own knowledge, and perhaps use this to estimate what is appropriate for the current situation, even when there is evidence that the information is not shared with the addressee. (For similar results, see also Wardlow Lane et al., 2006; Wardlow Lane & Ferreira, 2008). Referential communication tasks also have been used to test whether audience design-driven choices are influenced by the constraints of the task. For example, Horton and Keysar (1996) asked participants to describe moving shapes to their partner (a confederate, who participants thought was another participant), e.g., “I see a small circle moving from my side to yours”. They sat at a computer monitor that was divided with a foam divider, such that the participant and the confederate could each see only half of the monitor. The partner’s job was to verify that the object described was the same one they saw moving onto their side of the screen. The object either appeared in the context of another similar object (e.g., a large circle) or not. Critically, half the participants were told that their partner could see the same context shape, putting that information in common ground, whereas the other half were told that their partner could not see the context shape, meaning that this information was in the privileged ground. They also manipulated whether participants were forced to produce the utterance within a time limit or not. They found that when there were no time constraints, speakers were more likely to use contextually appropriate adjectives in the common-ground context than in the privileged-ground context. But when people had to respond quickly, the difference between common ground and privileged ground disappeared. They concluded that considering audience design is a time-consuming process. The above-mentioned studies suggest that speakers sometimes “fail” to use their partner’s perspective in interactive tasks, although notably the failure usually results in speakers providing extra information that is unlikely to severely hurt communication. By contrast, Yoon et al. (2012) hypothesized that people pay attention to common ground specifically when necessary for their communicative goals. Using a task like Nadig and Sedivy’s, they asked Korean speakers to describe objects on a display to a partner. But they manipulated the purpose of communication: in the “request” condition, speakers asked their partner to move the objects (e.g., “Can you move the plate to the left?”). In the “inform” condition, the experimenter moved the objects, and the speaker merely described the movement. Speakers were more sensitive to the common versus privileged ground manipulation when making a request, presumably because the details from their partner’s perspective were important for carrying out the request. In the “inform” condition, it was sufficient to provide a general description. Other tests of audience design instead focus on the shared information that is developed between two individuals. For example, Brennan and Clark (1996) examined how prior experience guides decisions about whether to use specific referring expressions (e.g., “the loafer”) versus more general ones (“the shoe”). They hypothesized that interlocutors develop shared perspectives on a particular object, called a “conceptual pact”. If you and I always call my shoes “loafers”, we are likely to re-use that name even when it isn’t necessary to distinguish my shoes from some other type of shoe. They had pairs of participants play a matching game where each had a set of cards depicting objects. The director’s cards were pre-arranged; their job was to instruct the matcher to put 192

Eliciting spontaneous linguistic productions

their cards in the same order. The question was how directors would refer to the target cards, which were common objects (e.g., loafer/shoe; retriever/dog). The cards were designed so that the first sets included these pictures in unique contexts, where the general term was sufficient (shoe/dog). The second sets included these pictures along with similar competitors, i.e., other shoes or other dogs. This required the use of a more specific term (loafer/retriever). On the final and critical set, these same pictures appeared in unique contexts again. Participants tended to stick with the more specific term, even though it was no longer necessary, especially when they had seen more trials with that picture in the non-unique context than in the unique context. This preference was also specific to their addressee: when participants entrained on the use of a specific term (loafer) with one addressee and were then asked to describe the same picture in a context with no competitors, they were more likely to refer to the basic term (shoe) if they were speaking to a new person, while they continued to use the specific term when talking to the same addressee. Interactive tasks are also used to demonstrate how language production is the result of collaboration between the speaker and the addressee. In a classic referential study, Clark and WilkesGibbs (1986) used a card-ordering task with cards containing abstract images composed of tangrams. The tangrams were designed to look vaguely human-like, but also to be consistent with multiple perspectives. One example given in their paper is a description “the person standing on one leg with the tail”, which was then changed by the director to “the ice skater”. Directors gave matchers instructions to put the cards in order, and they repeated the task six times with the same cards but in different orders. Clark and Wilkes-Gibbs demonstrated that the descriptions given by directors required complex social negotiation. Directors initially proposed a conceptualization, and matchers provided feedback to either accept or modify that perspective. Once the conceptual perspective was established, on subsequent trials the descriptions of the card became more and more efficient, using fewer words. This task demonstrates that referring is the result of collaboration. Production choices are also influenced by the types of feedback available in different situations. Clark and Krych (2004) examined language use within an interactive setting by asking participants to collaboratively build a Lego structure from a prototype. They manipulated interactivity: In the interactive condition, participants worked in pairs. The director saw a prototype of the intended structure (hidden from the builder) and gave instructions to the builder to recreate the structure with loose Lego blocks. They sat at a table, and for half the participants (the workspace-visible condition), the director could watch the builder ’s actions. For half the participants (the workspace-hidden condition), the director could not see the builder ’s blocks. In the non-interactive conditions, the directors instead recorded their instructions for a builder to follow in the future. The builders later listened to the instructions on tape and were allowed to rewind the tape as much as needed. One advantage of this task is that it constrained participants to do the same task, but it did not limit the language itself. This elicited contextualized language, and participants naturally responded to the actions of their partners when they could. For example, one of the interactive pairs produced this sequence: Director: Builder: Director: Builder: Director:

And then you’re gonna take a blue block of four. M-hm. ­ ​­ And you’re gonna put it on top of the four blocks – four yellow blocks farthest away from you. Which are the ones closest to the green. Yeah. 193

Jennifer Arnold

Builder: Director:

Okay. But the green’s still not attached. Yeah. And then …

(Clark & Krych, 2004, p. 66).

Clark and Krych use these data to demonstrate several core processes involved in communication. One is grounding – the process by which speakers “work with their partners to reach the mutual belief that their partners have understood them well enough for current purposes” (p. 63). Grounding can result from the speakers merely observing that the builder has followed their directions, but this option is not available in the workspace-hidden condition. Alternatively, they can linguistically verify the correct placement of each block. Clark and Krych analysed the joint building process to identify the “base time” that it took the builder to get a block into its correct position from the start of the director’s instruction, and the “checking time” that it took them to assess the correctness of that action. Both measures were much shorter in the workspace-visible condition than the workspace-hidden condition, demonstrating that visual evidence is more efficient in this task. By contrast, directors in the non-interactive condition were unable to ground their instructions, leading to a much less efficient process and many more errors. They also analysed the participants’ gestures to demonstrate the detailed exchange of information that occurred for participants in the workspace-visible condition. ­ ​­

12.5

Production in a discourse context

One part of the communicative function of language is that it does not consist of isolated words and sentences. Rather, natural language typically takes place within a discourse context, and this context imposes constraints on what speakers do. It also poses challenges for researchers, because testing phenomena that are constrained by discourse context means instantiating the discourse context in an experimental context. Of particular importance is information status, which refers to what is already known about the information to be produced. For example, if information is already known, it is termed “given”, while unmentioned information is “new” (e.g., Halliday, 1967). Information can become given if it was already mentioned, but also when visually present or through an association with something else that was mentioned (Clark & Marshall, 1981; Prince, 1992). To test the role of information status on production, researchers must set up discourse contexts that manipulate given/new status as well as elicit the target utterances. For example, Arnold et al. (2000) tested how givenness influences word order in an experiment that elicited sentences using the verb “give”, where givenness was manipulated through previous mention. “Give” is a transfer verb that can occur in either a double object construction (e.g., give the duck the crayon) or a prepositional construction (e.g., give the crayon to the duck). Previous work had shown that speakers tend to order given information before new (e.g., Birner & Ward, 1998), and also put shorter phrases (e.g., “the crayon”) earlier in the utterance than longer phrases (e.g., “the small green crayon”, Behaghel, 1909; Quirk et al., 1972). But short phrases also tend to be used for given information more than new, meaning that there is a correlation between the two. This raised a question: do both constraints matter, or just one? To test this, the experimenters needed to elicit sentences about giving in a discourse context where they could separately manipulate what had already been mentioned, and the length of linguistic expressions used. Arnold et al. recruited participants in pairs, where one person gave the other instructions to move objects. To make the activity somewhat plausible, they provided a cover story: Mrs. Bear is getting older and has decided to give away her belongings to the animals in the forest, but she 194

Eliciting spontaneous linguistic productions

needs help distributing them. One participant was the director, who played the part of Mrs. Bear, and one participant was the actor, who distributed the objects. Participants saw a variety of objects on a table, and nine boxes with pictures of animals on them (e.g., a yellow duck, an orange duck, and a magenta duck; a grey squirrel, a magenta squirrel, and a blue squirrel; a blue horse, an orange horse, and a green horse). As a manipulation of expression length, some of the objects occurred in sets (e.g., crayons of different sizes and colours), which required long descriptions, while some were unique (key, melon, screw), allowing shorter descriptions. As a manipulation of givenness, the actor followed pictorial cue cards to ask a question about either the animals or objects, e.g., “What about the ducks?” The director used pictorial cue cards (visible only to the director) to provide directions. On critical trials, the cue card had the word “GIVE” at the top of the card and illustrated three animals in a set with three objects. This elicited directions like “Give the yellow duck the small green crayon, give the big duck the big green crayon, and give the magenta duck the big yellow crayon”. On filler trials, the cue card illustrated TAKE AWAY or EXCHANGE events. Their findings demonstrated that speakers tended to use constructions that put the animal (the recipient) first more often when the animal was given and when the object description was longer than the animal description. This demonstrated empirically that both givenness and syntactic complexity matter independently. This method illustrates that a complex task is needed to elicit productions within a discourse context that manipulates information status. Information status also impacts variability in pronunciation. It is well known that words can be pronounced with either relative emphasis through acoustic prominence or more acoustically reduced. In English, reduced expressions tend to have shorter duration, lower pitch and/or less pitch movement, and lower intensity (Ladd, 1996). This variation is systematic, for example, words tend to be prominent when they haven’t been mentioned before (look at this CAT!) and reduced when they are repeated (The CAT walked in. Then the cat ate; Brown, 1983; Fowler & Housum, 1987; Halliday, 1967). One question is why repeated words are reduced. Is it because of the repetition per se, or is it because repeated words are predictable? Corpus work (Bell et al., 2009) has shown that pronunciations are shorter when words are predictable. Within a discourse context, previously mentioned objects are more likely to be mentioned again (Arnold, 1998). Lam and Watson (2010, exp. 2) tested the independent effects of repetition and predictability in an experimental task. Speakers saw a display of numerous objects, and described the two that moved, for example “the rooster is shrinking, the doll is flashing”, where the target was the second object. In the non-repeated condition, two different objects moved, and in the repeated condition, the same object did two actions. In addition, a circle appeared on screen around one of the objects; most of the time (92%) it signalled the object that would move, making it predictable. In the expected trials the circled object flashed, and in the unexpected trials some other object flashed. Expectedness and repetition were crossed with each other. They found that both repetition and expectedness resulted in less intensity, but only repetition led to shorter durations. Using a similar task, Kahn and Arnold (2012) compared pronunciations in contexts where the target was both predictable and linguistically given (participants heard the word) and contexts where the target was predictable only from a visual cue. Pronunciations were reduced in both cases, compared to a non-cued case, but more so in the case where the target was linguistically mentioned. These findings suggest that the discourse context may affect acoustic reduction for multiple reasons. By contrast, some research questions require establishing a story context, for example, questions about what drives speakers’ decisions about how to refer. All instances of referring require a choice, for example, between a description (the girl), a name (Alessandra), or a pronoun (she), where pronouns are typically reserved for situations where the referent is both given and topical or 195

Jennifer Arnold

highly salient in the context (Ariel, 1990; Chafe, 1976, Gundel et al., 1993). Experimentalists test questions about what discourse conditions drive this choice by asking people to tell or write pieces of a story within a manipulated discourse context. One popular method is to use pictures to constrain what people say. Arnold and Griffin (2007) tested how pronoun use was influenced by the number of people in a story. They showed people two-panel cartoons that illustrated characters doing something in the first panel. Participants heard a recorded sentence and repeated it, e.g., “Mickey went for a walk with Daisy in the hills one day”. They then continued the story by describing the second panel, which showed Mickey getting tired, e.g., “… and he got tired”. They found that pronouns were more common when the story had only one person in it (e.g., Mickey went for a walk in the hills one day) than two, and it didn’t matter whether the second panel showed one or two people (for similar techniques see Fukumura et al., 2011; Rosa & Arnold, 2017). One advantage to this task is that it establishes both a visual and linguistic context to ground the story, and by providing the first sentence, experimenters can fully control the linguistic context. But one challenge is that it is not a natural storytelling event, and participants vary in the degree to which they engage in the task as a story. Zerkle and Arnold (2017) used a similar storytelling task while measuring participants’ eye movements. Participants heard a sentence for the first panel (e.g., “The Duchess handed a painting to the Duke”, and then continued the story by describing a second panel, e.g. “The Duke threw the painting in the closet”. While this experiment was meant to elicit variable pronoun use based on the details of the context, participants instead exhibited a different pattern. Twenty-four of the 37 participants (dubbed “Context-Ignorers”) approached the task as a description of the second picture, without considering the story context, and they produced only descriptions (the Duke). The other 13 participants used some variation in referring expressions, so they were dubbed the “context-users”. The context-users also produced connector words to link the sentences (e.g., “and”, or “so”) about a third of the time, whereas the context-ignorers almost never did (1.5% responses). Eye movements revealed that these linguistic behaviours corresponded to different patterns of utterance planning. Context-users were faster to respond overall than context-ignorers. The context-users also spent more time than the context-ignorers looking at the first panel (which established the context) while they listened to the context sentence. This suggests that the contextignorers were using the time during the first sentence to plan their response based on the picture, rather than thinking about the context. In comparison, a different experiment used the same task but with modified instructions to encourage participants to pay attention to the context (Zerkle and Arnold, 2019). The experimenter modelled two example trials, and in both cases used pronouns and connector words. This led to greater variation in reference form, with about 30% of responses using pronouns and about 25% using coordinated verb phrases (e.g., The Duke received a painting from the Duchess … And threw it in the closet). This variability demonstrates that when speakers choose reference forms, the context doesn’t have an automatic or obligatory effect. Instead, speakers must think that the goal of the task is to produce connected discourse, as in a story. The instructions can support this conclusion, but there still may be individual variability in how people do the task. Thus, pronoun use is supported by the “connectivity constraint”, i.e., the degree to which the current task requires connected discourse. Support for this idea comes from a production experiment by Arnold and Nozari (2017). They used a related task, where people described moving shapes in sequences of five to even actions (e.g., “The yellow pentagon moves down 1 block; The yellow pentagon flashes; Then it jumps over the yellow square; The pink pentagon loops around the yellow square; … and Ø moves down

196

Eliciting spontaneous linguistic productions

1 block”.) They examined references to objects that had just moved on the previous action (i.e., “given” references). Speakers were more likely to use a description (“the yellow pentagon”) when the events were disconnected from each other, for example, when there was a pause between the two utterances or when the speaker was disfluent. They were also less likely to use descriptions when they produced connecting expressions like “and then”. In sum, numerous studies have elicited language in the context of carefully controlled discourses for the purpose of studying the effects of information status and other discourse effects. These studies also show that task constraints greatly influence variation in speech form.

12.6

Future directions

Experiments on language production aim to isolate the cognitive processes behind language generation, including those that support social coordination. Here we reviewed some of the main techniques used to examine word production, sentence formulation, as well as production choices that are impacted by the social and discourse context. Experimental techniques in each case are designed to highlight the process of interest. For example, in the Picture-Word Interference task, researchers test the impact of semantic versus phonological competitors within a particular timeframe (Schriefers et al., 1990). This requires tightly timed stimuli, and a fairly constrained task – naming a simple picture with a single word. This method is highly effective, but it also lacks the complexity of a socially contextualized discourse. By contrast, studies on task-based dialogue (e.g., Clark & Krych, 2004; Clark & WilkesGibbs, 1986; Yoon et al., 2012) elicit natural language, including disfluency, fragmented sentences, and open-ended exchanges. These tasks support analyses to demonstrate the complex processes used to navigate meaning while tracking the knowledge and goals of one’s interlocutor. But they provide less information about the time course of lexical retrieval and sentence formulation. Ongoing and future research will continue to refine these experimental methods in ways that will allow researchers to examine information processing within more naturalistic and grounded contexts. One approach is to combine techniques. For example, Brown-Schmidt and Tanenhaus (2008) combined eye-tracking with an open-ended and unscripted referential communication task and demonstrated how the context of the task modulates real-time competition between lexical items. Another approach is to combine language elicitation methods with other measures, such as neural activation. For example, Vanlangendonck et al. (2018) put participants in an fMRI scanner while they performed a common ground/privileged ground task like Nadig and Sedivy (2002) as well as a noncommunicative task. They demonstrated that the theory of mind network is activated specifically when engaged in a communicative task and when participants needed to consider common ground in order to produce appropriate utterances.

Further reading Word and sentence production Pickering, M. J., & Ferreira, V. S. (2008). Structural priming: A critical review. Psychological Bulletin, 134(​­ ­ ­ ­­ ​­ 3), ­427–459. https://doi.org/10.1037/0033-2909.134.3.427 Schriefers, H., Meyer, A. S., & Levelt, W. J. M. (2002). Exploring the time course of lexical access in language production: Picture word interference studies. In G. Altmann (Ed.), Psycholinguistics: Critical Concepts in Psychology [vol. 5] (pp. ­­  ­168–191). ​­ Routledge.

197

Jennifer Arnold

Production in a social context Arnold, J. E. (2008). Reference production: Production-internal and addressee-oriented processes. Language and Cognitive Processes, 23(4), doi:10.1080/01690960801920099 ­ ­495–527. ​­ ­ Brennan, S. E., & Hanna, J. E. (2009). Partner-specific adaptation in dialogue. Topics in Cognitive Science ­ ​­ (Special Issue on Joint Action), 1, 274–291.

Production in a Discourse context Arnold, J. E. (2011). Ordering choices in production: For the speaker or for the listener? In E. Bender & J. ­ E. Arnold (Eds.), Language from a Cognitive Perspective: Grammar, Usage, and Processing. Studies in Honor of Thomas Wasow (pp. ­­  ­199–222). ​­ CSLI Publications. Arnold, J. E., Kaiser, E., Kahn, J., & Kim, L. (2013). Information structure; Linguistic, cognitive, and processing approaches. WIREs Cognitive Science, 4, 403–413. ­ ​­ doi: 10.1002/wcs.1234 ­ Arnold, J. E. & Watson, D. G. (2015). Synthesizing meaning and processing approaches to prosody: Perfor​­ ­ mance matters. Language, Cognition, and Neuroscience, 30, ­88–102. doi: 10.1080/01690965.2013.840733 Arnold, J. E., & Zerkle, S. A. (2019). Why do people produce pronouns? Pragmatic selection vs Rational models. Language, Cognition and Neuroscience, 34, ­1152–1175. ​­ doi: 10.1080/23273798.2019.1636103 ­

Related topics Experimental syntax; experimental pragmatics; experimental sociolinguistics; experimental studies in discourse; experimental methods to study child language

References ​­ Ariel, M. (1990). Accessing ­noun-phrase antecedents. Routledge. Arnold, J. E. (1998). Reference form and discourse patterns (Doctoral Dissertation). Stanford University. Arnold, J. E., & Griffin, Z. M. (2007). The effect of additional characters on choice of referring expression: ​­ ­ Everyone counts. Journal of Memory and Language, 56, ­521–536. doi:10.1016/j.jml.2006.09.007 Arnold, J. E., & Nozari, N. (2017). The effects of utterance planning and stimulation of left prefrontal cortex ​­ on the production of referential expressions. Cognition, 160, ­127–144. Arnold, J. E., Wasow, T., Losongco, A., & Ginstrom, R. (2000). Heaviness vs. newness: The effects of structural complexity and discourse status on constituent ordering. Language, 76, ­28–55. ​­ Baars, B. J., Motley, M. T., & MacKay, D. G. (1975). Output editing for lexical status in artificially elicited slips of the tongue. Journal of Verbal Learning and Verbal Behavior, 14, ­382–391. ​­ Behaghel, O. (1909). Beziehungen zwischen Umfang und Reihenfolge von Satzgliedern. Indogermanische Forschungen, 25, ­110–42. ​­ Bell, A., Brenier, J., Gregory, M., Girand, C., & Jurafsky, D. (2009). Predictability effects on durations of content and function words in conversational English. Journal of Memory and Language, 60, ­92–111. ​­ Bialystok, E. (2009). Bilingualism: The good, the bad, and the indifferent. Bilingualism: Language and Cognition, 12(01), 3. doi:10.1017/S1366728908003477 ­ ­ Birner, B., & Ward, G. (1998). Information status and noncanonical word order in English. John Benjamins. Bock, J. K. (1986). Syntactic persistence in language production. Cognitive Psychology, 18(3), ­ ­355–387. ​­ https://doi-org.libproxy.lib.unc.edu/10.1016/0010-0285(86)90004-6 ­­ ​­ ­ ­­ ​­ ­ ­­ ​­ Brennan, S. E., & Clark, H. H. (1996). Conceptual pacts and lexical choice in conversation. Journal of Experimental Psychology: Learning, Memory and Cognition, 6, ­1482–1493. ​­ Brown-Schmidt, S., & Tanenhaus, M. K. (2006). Watching the eyes when talking about size: An investigation of message formulation and utterance planning. Journal of Memory and Language, 54, ­592–609. ​­ Brown-Schmidt, S., & Tanenhaus, M. K. (2008). Real-time investigation of referential domains in unscripted conversation: A targeted language game approach. Cognitive Science, 32(4), https://doi-org. ­ ­643–684. ​­ ­­ ​­ libproxy.lib.unc.edu/10.1080/03640210802066816 ­ ­ Brown, G. (1983). Prosodic structures and the given/new distinction. In D. R. Ladd & A. Cutler (Eds.), Prosody: Models and measurements (pp.67–77). Springer. ­ ­ ​­

198

Eliciting spontaneous linguistic productions Brysbaert, M., Stevens, M., Mandera, P., & Keuleers, E. (2016). How many words do we know? Practical estimates of vocabulary size dependent on word definition, the degree of language input and the participant’s age. Frontiers in Psychology, 7, 1116. Chafe, W. (1976). Givenness, contrastiveness, definiteness, subjects, topics, and point of view. In Charles N. ­ ­­  ­25–56). ​­ Li (Ed.), Subject and topic, (pp. Academic Press Inc. Clark, H. H. (1996). Using language. Cambridge University Press. Clark, H. H., & Krych, M. A. (2004). Speaking while monitoring addressees for understanding. Journal of Memory and Language, 50, ­62–81. ​­ Clark, H. H., & Marshall, C. R. (1981). Definite reference and mutual knowledge. In A. K. Joshi, B. L. Web­­  ­10–63). ​­ ber, & I. A. Sag (Eds.), Elements of discourse understanding (pp. Cambridge University Press. ­ ​­ Clark, H. H., & Wilkes-Gibbs, D. (1986). Referring as a collaborative process. Cognition, 22, 1–39. Cleland, A. A., & Pickering, M. J. (2003). The use of lexical and syntactic information in language production: ­ ​­ Evidence from the priming of noun-phrase structure. Journal of Memory and Language, 49, 214–230. Dell, G., & Reich, P. A. (1981). Stages in sentence production: An analysis of speech error data. Journal of Verbal Learning and Verbal Behavior, 20, 611429. Deutsch, W., & Pechmann, T. (1982). Social interaction and the development of definite descriptions. Cogni­ ­159–184. ​­ ­­ ​­ ­ ­­ ​­ ­ ­­ ​­ tion, 11(2), https://doi-org.libproxy.lib.unc.edu/10.1016/0010-0277(82)90024-5 Engelhardt, P. E., Bailey, K. G. D., & Ferreira, F. (2006). Do speakers and listeners observe the Gricean maxim of quantity. Journal of Memory and Language, 54, 554–573. doi:10.1016/j.jml.2005.12.009 Ferreira, V. S., Slevc, L. R., & Rogers, E. S. (2005). How do speakers avoid ambiguous linguistic expres​­ sions? Cognition, 96, ­263–284. Fowler, C., & Housum, J. (1987). Talkers’ signalling of ‘new’ and ‘old’ words in speech and listeners’ perception and use of the distinction. Journal of Memory and Language, 26, ­489  –504. ​­ Frank, M. C., & Goodman, N. D. (2012). Predicting pragmatic reasoning in language games. Science, 336, 998. Fukumura, K., van Gompel, R. P., Harley, T., & Pickering, M. J. (2011). How does similarity-based in​­ terference affect the choice of referring expression? Journal of Memory and Language, 65, ­331–344. doi:10.1016/j.jml.2011.06.001 ­ Gahl, S. (2008). “Time” and “thyme” are not homophones: The effect of lemma frequency on word durations ​­ in spontaneous speech. Language, 84, ­474–496. Gollan, T. H., & Goldrick, M. (2012). Does bilingualism twist your tongue?. Cognition, 125(3), ­ ­491–497. ​­ doi:10.1016/j.cognition.2012.08.002 ­ Graf, C., Degen, J., Hawkins, R. D. X., and Goodman, N. D. (2016). Animal, dog, or dalmatian? Level of abstraction in nominal referring expressions. In A. Papafragou, D. Grodner, D. Mirman, and J. C. Trueswell ­ (Eds.), Proceedings of the 38th Annual Conference of the Cognitive Science Society. Cognitive Science Society. Griffin, Z. M., & Bock, K. (2000). What the eyes say about speaking. Psychological Science, 11, ­274–279. ​­ Gundel, J. K., Hedberg, N., & Zacharski, R. (1993). Cognitive status and the form of referring expressions in discourse. Language, 69(2), ­ ­274–307. ​­ Hall, M. L., Ferreira, V. S., & Mayberry, R. I. (2015). Syntactic priming in American Sign Language. PLoS ONE, 10(3), ­ e0119611. doi:10.1371/journal.pone.0119611 ­ Halliday, M. (1967). Notes on transitivity and theme in English 1. Journal of Linguistics, 3, ­199–244. ​­ Horton, W. S., & Keysar, B. (1996). When do speakers take into account common ground? Cognition, 59, ­91–117. ​­ Kahn, J., & Arnold, J. E. (2012). A processing-centered look at the contribution of givenness to durational reduction. Journal of Memory and Language, 67, ­311–325. ​­ Koolen, R., Gatt, A., Goudbeek, M., & Krahmer, E. (2011). Factors causing overspecification in definite descriptions. Journal of Pragmatics, 43(13), https://doi-org.libproxy.lib.unc.edu/10.1016/j. ­ ­3231–3250. ​­ ­­ ​­ ­ ­ pragma.2011.06.008 Ladd, R. (1996). Intonational phonology. Cambridge University Press. Lam, T. Q., & Watson, D. G. (2010). Repetition is easy: Why repeated referents have reduced prominence. Memory & Cognition, 38, ­1137–46. doi:10.3758/MC.38.8.1137 ​­ ­ Levelt, W. J. M. (1989). Speaking: From intention to articulation. The MIT Press. Li, C., Goldrick, M., & Gollan, T. H. (2017). Bilinguals’ twisted tongues: Frequency lag or interference? Memory & Cognition, 45(4), https://doi-org.libproxy.lib.unc.edu/10.3758/s13421-017-0688-1 ­ ­600–610. ​­ ­­ ​­ ­ ­­ ­​­­ ­​­­ ​­ Mahowald, K., James, A., Futrell, R., & Gibson, E. (2016). A meta-analysis of syntactic priming in language production. Journal of Memory and Language, 91, ­5–27. ​­

199

Jennifer Arnold Motley, M. T., Camden, C. T., & Baars, B. J. (1981). Toward verifying the assumptions of laboratory induced slips of the tongue: The output-error and editing issues. Human Communication Research, 8, 3–15. ­ ​­ Motley, M. T., Camden, C. T., & Baars, B. J. (1982). Covert formulation and editing of anomalies in speech production: Evidence from experimentally elicited slips of the tongue. Journal of Verbal Learning & Verbal Behavior, 21(5), https://doi-org.libproxy.lib.unc.edu/10.1016/S0022-5371(82)90791-5 ­ ­578–594. ​­ ­­ ​­ ­ ­­ ​­ ­ ­­ ​­ Nadig, A. S., & Sedivy, J. C. (2002). Evidence of perspective-taking constraints in children’s on-line reference resolution. Psychological Science 13(4), ­ ­329–336. ​­ Nozari, N., & Dell, G. S. (2012). Feature migration in time: Reflection of selective attention on speech errors. Journal of Experimental Psychology: Learning, Memory, and Cognition, 38(4), https://doi­ 1084–1090. ­ ​­ ­­ ​ ­org.libproxy.lib.unc.edu/10.1037/a0026933 ­ ­ Pickering, M. J., & Ferreira, V. S. (2008). Structural priming: A critical review. Psychological bulletin, 134 (3), https://doi.org/10.1037/0033-2909.134.3.427 ­ 427–459. ­ ​­ ­ ­ ­­ ​­ Pogue, A., Kurumada, C., & Tanenhaus, M. K. (2016). Talker specific generalization of pragmatic inferences based on ­under-or prenominal adjective use. Frontiers in Psychology, 6, 1–18. ​­ ­over-informative ​­ ­ ​­ Prince, E. (1992). The ZPG letter. Subjects, definiteness, and information status. In W. Mann & S. Thompson (Eds.), Discourse Description: Diverse Linguistic Analyses of a Fund-Raising Text (pp. John ­ ­­  ­295–325). ​­ Benjamins. Quirk, R., Greenbaum, S., Leech, G., & Svartvik, J. (1972). A grammar of contemporary English. Longman. Rosa, E. C., & Arnold, J. E. (2017). Predictability affects production: Thematic roles can affect reference form selection. Journal of Memory and Language, 94, ­43–60. doi: 10.1016/j.jml.2016.07.007 ​­ ­ Schriefers, H., Meyer, A. S., & Levelt, W. J. M. (1990). Exploring the time course of lexical access in language production: picture-word interference studies. Journal of Memory and Language, 29, ­86–102. ​­ Van der Meulen, F., Meyer, A. S., & Levelt, W. J. M. (2001). Eye movements during the production of nouns and pronouns. Memory & Cognition, 29, ­512–521. ​­ Vanlangendonck, F., Willems, R. M., & Hagoort, P. (2018). Taking common ground into account: Specifying the role of the mentalizing network in communicative language production. PLoS ONE, 13(10), ­ e0202943. https://doi.org/10.1371/journal.pone.0202943 ­ ­ ­ Wardlow Lane, L. & Ferreira, V. S. (2008). Speaker-external versus speaker-internal forces on utterance form: Do cognitive demands override threats to referential success?. Journal of Experimental Psychology: Learning, Memory, and Cognition, 34, ­1466–1481. ​­ Wardlow Lane, L., Groisman, M., & Ferreira, V. S. (2006). Don’t talk about pink elephants! Speakers’ control over leaking private information during language production. Psychological Science, 17, ­273–277. ​­ Yoon, S. O., Koh, S., & Brown-Schmidt, S. (2012). Influence of perspective and goals on reference production in conversation. Psychonomic Bulletin & Review, 19, ­699–707. ​­ Zerkle, S., & Arnold, J. E. (2017). Discourse attention during utterance planning affects referential form choice. Linguistics Vanguard, 2(s1). https://doi.org/10.1515/lingvan-2016-0067 ­ ­ ­ ­­ ­​­­ ​­ Zerkle, S., & Arnold, J. E. (2019). Does planning explain why predictability affects reference production? Dialogue and Discourse, 10(2), 34–55. ­ ​­ Ziegler, J., & Snedeker, J. (2018). How broad are thematic roles? Evidence from structural priming. Cognition, 179, ­221–240. ​­

200

13 ANALYSING SPEECH PERCEPTION Sandra Schwab and Nathalie Giroud

13.1

Introduction

Speech perception can be defined as the perceptual and cognitive processes leading to the discrimination, identification, and interpretation of speech signals (Matthys, 2013). In other words, speech perception corresponds to the mapping from sounds to internal linguistic representations. Several characteristics of the auditory signal makes speech perception challenging. First, contrary to written information, the speech signal is time-bound and transient. Since listeners cannot ‘listen back’ to what they just heard, speech perception must thus take place in ‘real time’ (i.e., referred to as ‘sequentiality’ of speech signal; e.g., Matthys, 2013). Second, the speech signal is continuous. The absence of clear boundaries between adjacent sounds (or words) due to coarticulation effects (i.e., blending of articulatory gestures between adjacent sounds) constitutes an obstacle for speech perception. Together with sequentiality, the continuity of speech might indeed jeopardize the classification of the auditory signal into a given phonetic category. For example, the French phoneme /b/ is realized as [b] when followed by a voiced consonant (e.g., oblique, English en. oblique) but as [p] when followed by a voiceless consonant (e.g., observer, en. observe). Third, also because of coarticulation, the speech signal is extremely variable. A single acoustic segment comprises information about its adjacent sounds (e.g., a bit of the sound [i] is already present at the end of the sound [s] in the English word ‘see’ [si]). Inversely, the same phoneme can be realized acoustically in different ways depending on the surrounding phonetic context, speaking rate and speaker (e.g., the acoustic properties of the sound [s] are not similar in the English words ‘see’ [si] and ‘sue’ [su]). Consequently, during speech perception, listeners need to adapt their perceptual system to speech variability (also called ‘lack of invariance)’. In other words, listeners need to somehow normalize the auditory input for context, rate, or speaker differences to be able to identify the speech signal (Pisoni, 1981). Since the processes of perception cannot be directly observed, researchers have developed experimental designs to examine the listeners’ perceptual behaviour, by means of behavioural methods (see Section 13.2), and to investigate the neural basis underlying such behaviour, with neurophysiological methods (see Section 13.3).

201

DOI: 10.4324/9781003392972-16

Sandra Schwab and Nathalie Giroud

13.2

Behavioural methods

In speech perception experiments, researchers present listeners (also called participants) with auditory stimuli and ask them to make different types of judgements about the stimuli. Experiments allow the researcher to have precise control over the stimuli the participants listen to and to measure their accuracy and response time. The next sections will describe the most common behavioural tasks used in speech perception: identification and discrimination tasks. Section 13.4.1 will provide an illustration of the use of these tasks in the study of vowel acquisition in second/foreign language (L2). ­

13.2.1  Identification tasks In identification tasks – also called classification, phonetic categorization, or labelling tasks – listeners hear one stimulus at the time, and have to label that stimulus either from an open set (e.g., ‘write what you hear’) or from a closed set of responses (‘choose among these sounds which one you hear’; task also called ‘forced-choice identification’). Although open responses exactly reflect what the participants have perceived, their main disadvantage is that the experimenter has to define criteria to classify the listeners’ different responses as being correct or incorrect (e.g., responses cap and clap given for speech sound cap). The aim of forced-choice identification tasks is to examine how the listeners interpret and classify the auditory stimuli into given categories. This kind of designs can typically be used to study categorical perception (Knight & Hawkins, 2013; McQueen, 1996; Miller & Volaitis, 1989) where listeners have to label speech sounds that stem from an acoustically manipulated continuum ranging from one sound to another. For example, a continuum ranging from the English sounds [p] to [b] can be constructed by decreasing the duration of voice onset time (VOT; time between the release of a stop consonant and the onset of vocal fold vibration). Identification tasks can also be designed to examine the categorization of speech sounds that are not necessarily part of an acoustically constructed continuum. For example, it is possible to present listeners with (naturally produced) vowels or consonants, and ask them, for each stimulus, to indicate among the given (maybe more than two) choices the item they have perceived (e.g., Carlet & Cebrian, 2022; Iverson et al., 2012). Besides investigating the classification of speech segments (e.g., vowels and consonants), identification tasks can also be used to study the perception of suprasegmental features, such as lexical stress (e.g., Schwab & Dellwo, 2022) or pitch contours (see Prieto, 2012 for a review). Target speech sounds can be presented in isolation, within a syllable, a pseudoword (i.e., sequence of sounds that do not form a real word in the language) or a real word. If real words are used, care has to be taken to avoid the influence of lexical factors (e.g., lexical frequency, word length) on the identification of the target speech sound (Knight & Hawkins, 2013). For example, if listeners hear the English word /stɑrt/ and have to decide whether they have heard /ɑ/ or /ɔ/, the identification of the vowel /ɑ/ will be enhanced by the recognition of the entire word start, since the word /stɑrt/ exists and /stɔrt/ does not. This issue could be solved by presenting participants ​­ minimal pairs (i.e., words that differ only by one sound), such as card-cord (/kɑrd/-/kɔrd/), whose members should ideally present similar lexical frequency. Regarding response labels in tasks where participants listen to and categorize speech sounds in isolation or in words, labels can be orthographic (letters or keywords), phonetic (or phonetic-like) or a combination of these (Fouz-González & Mompean, 2020). The use of orthographic labels is, however, problematic in languages where the correspondence between sounds and spelling is not regular (e.g., English, French). On the other hand, although phonetic symbols (such as 202

Analysing speech perception

International Phonetic Alphabet, IPA) are consistently matched with sounds, they are often not known by participants. One solution would be to combine phonetic symbols and keywords (i.e., real words illustrating each sound tested) (e.g., Carlet & Cebrian, 2022). In particular cases, such as perceptual assimilation tasks – where non-native listeners listen to speech sounds in a foreign language and have to indicate which sound of their native language is closest to the sound they have heard – the response labels can also match the common spelling associated to the sounds in the listeners’ native language (e.g., French /e/ identified as [delay] or [three] in English; Colantoni et al., 2015). Relative response proportion (for continuum) of correct responses and response times are commonly reported outcome measures for identification tasks, as well as confusion matrix to identify listeners’ error patterns. Confusion matrices are tables with two dimensions (‘actual sound’ and ‘perceived sound’) to examine to what extent the actual sounds have been perceived as such (i.e., correct responses) or as other sounds (i.e., errors).

13.2.2

Discrimination tasks

In discrimination tasks, listeners generally hear more than one auditory stimulus at each trial and have to compare them in order to make a decision (e.g., judge whether the stimuli are the same or different, decide to which given item the target item sounds more like). Contrary to identification where the comparison is always with a standard held ‘in the listener’s head’ (Knight & Hawkins, 2012), listeners compare the items that are actually heard. Also, unlike identification tasks, response labelling is not an issue in discrimination tasks, since participants do not need to know IPA symbols to be able to judge whether two sounds are the same or different. The number of items in each trial can vary. In AX discrimination tasks (i.e., indicate whether X is the same or different from A; also called ‘same-different’ discrimination task), listeners hear two stimuli and have to indicate whether they perceive them as being the same or different). In ABX discrimination tasks (i.e., matching-to-sample), three stimuli are presented in the trial and listeners have to indicate which stimulus, A or B, is the same or most similar to the X stimulus. In order words, the participants have to match the X stimulus to the sample A or B. Variations of the ABX design can have the target stimulus between A and B (AXB) or before A and B (XAB). Another kind of discrimination task is the Odd-One-Out task (i.e., oddity). Listeners hear three stimuli with two being the same and one being different and they have to indicate which is the different one (i.e., the odd stimulus) (see Gerrits & Schouten, 2004 for other existing discrimination designs). Like identification tasks, discrimination ones can examine categorical perception: listeners hear pairs of stimuli from a constructed continuum (e.g., from English [p] to [b]). compare them and indicate whether they perceive them as being the same or different (i.e., AX design). In so doing, it is possible to determine to what extent listeners judge acoustically manipulated stimuli as belonging to the same or different phoneme category (e.g., Kronrod et al., 2016). Moreover, as identification tasks, discrimination ones can also be used to examine the perception of stimuli that do not necessarily stem from a continuum, whether being speech sounds (e.g., vowels or consonants; Carlet & Cebrian, 2022; Iverson et al., 2012) or suprasegmentals (e.g., stress patterns or pitch contours; Prieto, 2012; Schwab & Dellwo, 2022; Schwab et al., 2022 described in Section 13.4.3). Also, the target elements (e.g., vowels) can be presented in isolation, within syllables, pseudowords or real words. As in identification studies, if real words are used, precautions need to be taken regarding possible effects of lexical factors, such as lexical frequency. Furthermore, one should be careful that the ‘same’ stimuli (whether in AX, ABX and oddity designs) are different tokens (e.g., different recordings of the same item) to ensure that the 203

Sandra Schwab and Nathalie Giroud

discrimination judgements are based on the perception of the target features (e.g., vowel) and not on other irrelevant cues (e.g., kind of background noise, particular pronunciation of the consonants surrounding the target vowel). Moreover, whatever the kind of discrimination design, the stimuli within the trials are separated by a specified amount of time, the so-called interstimulus interval (ISI). It is important to be aware that manipulations of ISI duration influence the type of perceptual processing during discrimination tasks (Fujisaki & Kawashima, 1970; Pisoni, 1973). ISIs shorter than 500 milliseconds (henceforth ms) rather trigger auditory processing, mainly based on sensory information, whereas longer ISIs rather enhance phonetic processing, based on abstracted, categorical representations of the target sounds, as illustrated in Cebrian et al. (2021), presented in Section 13.4.1. The most common measures used in discrimination tasks include accuracy (i.e., proportion of correct response) and response times. In designs with binary responses (e.g., AX, ABX), sensitivity measures like d’ from Signal Detection Theory (Macmillan & Creelman, 1991) can also be reported as illustrated in Gerrits and Schouten (2004). Such a sensitivity measure controls for response bias by comparing the proportion of ‘hits’ (i.e., listeners correctly perceive the stimuli as being different) to those of ‘false alarms’ (i.e., listeners erroneously perceive the stimuli as being different when they were actually the same). Let’s imagine that in an AX experiment with half of the trials being the same and the other half being different, a participant always presses the ‘different’ button. They will present 50% of correct responses, 100% of hits but also 100% of false alarms. The similar proportion of hits and false alarms shows that this participant has no discrimination ability. Comparing the proportion of hits and false alarms allows thus to observe participants’ degree of sensitivity (e.g., discrimination ability).

13.2.3

Rating

Rating tasks (also called scaling tasks) ask the participants to evaluate the stimuli on a numerical (e.g., Likert-scale response) or visual scale (e.g., slider response). Rating tasks collect subjective evaluations about one or various dimensions of the stimuli (e.g., goodness, nativelikeness, speech rate) or stimulus pairs (e.g., similarity). In these experiments, participants’ responses can be considered as meta-judgements, since they are conscious and reflective decisions about given dimensions. The target stimuli can be speech sounds, syllables, pseudowords, real words, and utterances. For example, in goodness ratings (e.g., Miller, 1994), participants hear acoustically manipulated syllables (e.g., /pi/ with different VOT duration) and they have to judge each exemplar for its goodness as a member of the phoneme category (e.g., /p/ category) using a numerical rating scale (e.g., from 1 = bad member to 10 = good member). In second language research, rating methods are commonly used to assess the degree of L2 speech accentedness and comprehensibility (e.g., Munro, 2008). Moreover, rating tasks are also commonly applied to study the perception of suprasegmental features, such as rhythm or speech rate (e.g., Dellwo, 2008). Because participants may vary on the way they use a scale (e.g., some participants use mainly the upper part of the scale while other make use of the middle part only), it might often be useful to transform the data into standardized units, such as z-scores.

13.3

Neurophysiological methods

Complementarily to behavioural research examining listeners’ behaviour during speech perception, neurophysiological research investigates the neural basis of speech perception. Neurophysiological methods allow to determine when listeners process the information present in the speech signal (e.g., with EEG methods) and where in the brain speech processing occur (e.g., with fMRI 204

Analysing speech perception

methods). The next sections will present the basics of EEG and fMRI techniques. Sections 13.4.2 and 13.4.3 will illustrate the use of these methods with the presentation of concrete experiments conducted in speech perception, more precisely in the perception of VOT and in lexical stress processing in a foreign/second language (L2), respectively.

13.3.1 

Electroencephalography (EEG) ­

Electroencephalography (EEG) is a non-invasive technique used to measure electrical brain activity. With electrodes connected to the scalp, voltage fluctuations originating from cortical and subcortical regions of the brain can be recorded. EEG’s most important advantage compared to other neurophysiological methods is its excellent time resolution (1 ms or less) (Luck, 2005). Especially when investigating temporal aspects of speech perception, EEG is, therefore, the method of choice. A disadvantage, however, is the low-spatial resolution of EEG. The spatial distribution of brain electrical activity source is difficult to determine based on EEG signal measured on the scalp, a problem referred to as the ‘inverse problem’. In other words, the activity recorded from one electrode does not correspond to the specific activity of the local brain source directly underneath this electrode, but to the activity from multiple spatially dispersed sources in the cortex. However, it is possible to address the ‘inverse problem’ with new mathematical procedures so that neural sources from within the cortex can be modelled based on the measured activity on the scalp (e.g., LORETA, dipole analysis, beamformer analysis) (Jäncke, 2005). Also, modern EEG systems allow to use a large number of electrodes leading to a highspatial resolution on the scalp. Thus, in combination with recently developed and sophisticated space domain-oriented analysis techniques for scalp EEG distributions (e.g., microstates, topographical mapping), more comprehensive insights on neural correlates of speech perception – and other aspects of cognition – can be investigated with excellent temporal resolution and adequate scalp-level spatial distribution, an approach called ‘electrical neuroimaging’ (Michel et al., 2009). For EEG recordings, participants wear a cap with electrodes that record the electrical brain activity during speech perception experiments. Participants sit in a faraday cage to minimize acoustic and electrical noise during recording. For the same reasons, usually headphones, rather than loudspeakers, are used to present speech stimuli. Participants need to be instructed to move as little as possible during the recording and to relax, because any muscle tension or movement will affect the electrodes on the scalp and lead to noise in the EEG data. Furthermore, it is recommended to place at least two electrodes on the eye muscles to record (and later remove) noise due to eye movements and blinks. After the recording, many different analysis techniques of the EEG signal allow to analyse temporal, spectral, and spatial aspects of the signal as a function of speech perception. It is important to note that the quality of the analysis strongly depends on the correct choice of the experimental paradigm and settings. Thus, it is recommended to define, implement, and pilot the choice of the experimental paradigm thoroughly before starting the recordings with real study participants. The identification of the so-called ERPs (i.e., event-related potentials) is a common analysis of the EEG signal measured in speech perception experiments. ERPs describe positive and negative deflections (i.e., peaks or valleys) time-locked to events of interest (e.g., a target speech stimulus) at specific latencies and at specific electrodes (Michel et al., 2009). Each target speech stimulus is repeated many times to obtain a large number of neural responses for each stimulus. Then, by averaging all neural responses to a specific stimulus, it is possible to isolate ERPs from the raw EEG signal. In other words, averaging allows to reject randomly distributed EEG activity, while the always identical neurophysiological response to the repeated target stimulus (i.e., the ERP) becomes apparent (Jäncke, 2005). 205

Sandra Schwab and Nathalie Giroud

Several ERPs have been described in terms of their functional correlates elicited by different stimulus features or different task demands (Michel et al., 2009). These ERPs (e.g., N1, P2, N400, P600, etc.) can be experimentally manipulated in terms of their amplitude and latency. Amplitude is an indicator of the degree of deflection (i.e., magnitude) which correlates with the number of synchronized active neurons. The absence of or the magnitude of the deflections, therefore, give information about the functional state of neurons in relation to specific target stimuli (Jäncke, 2005). The latency reflects the timing of the process. Thus, for understanding speech perception, the advantage of the ERP technique lies in the continuous and temporally accurate measurement of associations between a speech stimulus and a neural response reflecting specific perceptual or cognitive processes associated with the processing of the speech phenomenon under study. The study by Elmer et al. (n.d.) based on Kurthen (2014) and presented in Section 13.4.2 illustrates the use of EEG technique in the study of VOT processing.

13.3.2

Functional magnetic resonance imaging (fMRI)

Functional magnetic resonance imaging (fMRI), as compared to EEG, is an indirect method to study brain activity. Changes in neuronal activity are associated with vascular changes in active brain regions and are indirectly derived from changes in the ratio of oxygenated and deoxygenated haemoglobin. This ratio can be observed in the ­blood-oxygen-level-dependent (BOLD) signal (Ogawa et  al., ­​­­ ­​­­ ​­ ­ ­ 1990). The BOLD signal can be recorded by placing a participant in a magnetic resonance (MR) scanner which generates a strong permanent magnetic field (typically within the range of 1.5–7 Tesla). Radiofrequency pulses are transmitted to oxygenated and desoxygenated haemoglobin, whose different magnetic responses indicate vascular changes in different brain regions and, therefore indirectly, changes in brain activity. The results are three-dimensional functional maps of the brain with a high-spatial resolution (in millimeter range) which is the major advantage of the fMRI as compared to EEG. However, the BOLD response is increasing slowly and peaks only at around 7–10 seconds after stimulation (e.g., after speech stimulus onset) which means that the temporal resolution (in the range of several seconds) of the fMRI is a serious drawback. Furthermore, when analysing perceptual or cognitive processes related to speech perception, another disadvantage with fMRI is the strong acoustic noise which is generated by the scanner. Thus, auditory stimulation during fMRI acquisition is strongly disturbed by the acoustic scanner noise. Importantly, new methods have been developed to deal with this issue by taking advantage of the slow hemodynamic response. For example, the ‘sparse temporal sampling’ approach (Hall et al., 1999) presents acoustic stimuli in quiet and records a limited set of images during the pauses between the stimuli. It has further developed into a ‘clustered-sparse’ imaging method, which records more brain images, still allowing for the presentation of speech in quiet (Zaehle et al., 2007). The research conducted by Schwab et al. (2022) described in Section 13.4.3 provides an illustration of the combination of fMRI and behavioural methods in the study of lexical stress processing in a foreign/second language (L2). ­ ­

13.4 13.4.1

Exemplar studies Perception of L2 vowels

The study designed by Cebrian et al. (2021) concretely illustrates the use of the different behavioural methods described in Section 13.2. Their research, which lies within the framework of second/foreign language (L2) acquisition, aimed to explore the perception (and production) of the 206

Analysing speech perception

English /iː/-/ɪ/ contrast by native listeners of Spanish/Catalan. Besides studying L2 vowel production (not described here), the authors investigated the relationship between the perceived similarity between L2 English and native (L1) Spanish vowels and the listeners’ ability to identify and discriminate L2 English vowels. For this, they used a perceptual assimilation task with goodness rating, an identification task and a discrimination task. They focussed their analyses on the English vowels /iː, ɪ, ε, ɜː/. Participants were Catalan/Spanish bilinguals who were first-year English Studies undergraduate students at Universitat Autònoma de Barcelona (UAB) Spain. The next sections provide, for each task, a description of the experimental design and data analysis procedure (without statistical modelling). Note that, since focus is on the methodological aspects of the experimental designs, the main results of the study will only be briefly exposed.

13.4.1.1

Perceptual assimilation task

Method: In the perceptual assimilation task, listeners heard the following English vowels /iː, ɪ, ε, eɪ, aɪ, æ, ɑː, ɜː, ʌ/ pronounced in /bVt/ words (e.g., bit, beat, bet, etc.) that were produced by three male speakers of Standard Southern British English (SSBE). Participants were asked to identify each English vowel among several possible L1 (Spanish) sounds by clicking on one of the response labels presented on a computer screen. Besides selecting the L1 sound, listeners had to give a goodness rating on a 7-point scale (1 = a poor example of the selected vowel; 7 = a good, native-like example). The response labels consisted of the most common spelling for each Spanish vowel and diphthong () together with a Spanish word illustrating the vowel (e.g., ­ di for /i/, de for /e/). Data analysis: Assimilation percentage was obtained by computing the percentage of times each English target vowel (e.g., /ɪ/) was identified as being each of the labelled Spanish vowels (e.g., 67% of /ɪ/ identified as Spanish and 32% as Spanish). Average goodness ratings were also calculated (e.g., /ɪ/ identified as Spanish with a goodness rating of 5.9 and as with a goodness rating of 5.4) as well as a fit index score (FI). This score was obtained by multiplying the assimilation percentage by the goodness rating (e.g., /ɪ/ identified as Spanish with FI of 4 and as with a FI of 1.6) and allowed to distinguish cases of similar assimilation scores that differed in goodness ratings. Main results: English /iː/ was strongly assimilated to Spanish (FI = 6.1) while English /ɪ/ showed (weaker) assimilation to Spanish (FI = 4). Spanish was more strongly associated to English /ε/ (FI = 5.3) than to English /ɜː/ (FI = 3.1).

13.4.1.2

Identification task

Method: In the identification task participants listened to high-frequency monosyllabic English words. These words constituted minimal pairs differing only by the target vowel (e.g., bead, bid, bed, bird) and were produced by two native speakers of SSBE (one male and one female). Participants were asked to identify the vowel in the word using one of seven response labels appearing on the computer screen. The labels consisted of a phonetic symbol together with two common words representing each vowel (i.e., /æ/ ash, mass; /ʌ/ sun, thus; /ɪ/ fish, his; /iː/ cheese, leaf; /ɜː/ earth, first; /ε/ less, west; /ɑː/ arm, palm). Data analysis: The proportion of correct identifications for each target vowel was computed, as well as the proportion of misidentifications (i.e., confusion matrix to identify error patterns). Note that the inferential statistical analyses, reported in the paper, but not reported here, were run on binary correct/incorrect responses, not on percentages. 207

Sandra Schwab and Nathalie Giroud

Main results: English /ε/ was the most accurately identified (90% correct) followed by /ɪ/ (72%), /iː/ (58%) and /ɜː/ (47%). Regarding the pattern of errors, English /iː/ was most frequently misperceived as /ɪ/ (39%) and /ɪ/ as /iː/ (21%), while /ɜː/ was most often misidentified as /ɑː/ (19%) or /ʌ/ (23%).

13.4.1.3

Discrimination task

Method: The discrimination task was an AX same/different discrimination task. The material was the same as for the identification task (i.e., words produced by two speakers), except that the words were presented in pairs (e.g., bead-bid, bead-bead). For each pair of vowels under study (i.e., /iː/-/ɪ/ and /ε//ɜː/, the two possible speaker (speaker 1-speaker 2, speaker 2-speaker 1) and vowel orders (e.g., /iː/-/ɪ/ and /ɪ/-/iː/) were presented. Half the stimuli included the same vowel category (same-category trials; e.g., /iː/-/iː/), and half were different-category trials (e.g., /iː/-/ɪ/). The interstimulus interval (ISI) was 1.15 second so that participants used phonetic information stored in long-term memory instead of relying on sensory memory. After participants heard each word pair, they responded whether the two words were the same or different by clicking on the corresponding label on a computer screen. Data analysis: The proportion of correct discriminations was calculated for each vowel pair in the ‘different condition’. Note again that the inferential statistical analyses reported in the paper were run on binary correct/incorrect responses, not on percentages. Main results: The /ε/-/ɜː/ pair was discriminated more successfully than the /iː/-/ɪ/ pair, with mean percentages of correct discriminations of 87% for /ε/-/ɜː/ and 66% for /iː/-/ɪ/.

13.4.1.4 ­Cross-task ​­ correlational analysis Conducting several perception tasks with the same participants makes cross-task comparisons possible and allows a better understanding of interindividual differences. For example, it might be that some Spanish listeners assimilate both English vowels /iː/ and /ɪ/ to Spanish and are, therefore, not able to discriminate the two vowels, whereas other listeners associate the two English vowels to different Spanish vowels and are thus capable to distinguish them. Method: The relationship between listeners’ perceived similarity between L1 and L2 sounds (i.e., perceptual assimilation results) and their L2 perception accuracy (i.e., identification and discrimination results) was examined. Data analysis: For each vowel, Spearman correlations were conducted between assimilation measures (i.e., assimilation percentage, goodness rating, and fit index score) and perception measures (i.e., identification correct percentage and discrimination correct percentage). Main results: Results showed very little evidence of a relationship between the listener’s crosslinguistic perceived similarity and their L2 performance.

13.4.1.5 

Main methodological ­take-home ​­ message

The research conducted by Cebrian et al. (2021) perfectly illustrates the use of different behavioural tasks in a single study, whether within an experiment (i.e., perceptual assimilation together with goodness ratings) or between experiments (i.e., identification and discrimination). The combined use of several tasks allows cross-task comparisons that potentially yield to a better understanding of the interindividual variability, especially present in L2 perception. Needless to say, this kind of design – with multiple behavioural tasks – is also perfectly suitable for the study of L1 speech perception. 208

Analysing speech perception

13.4.2

EEG study on VOT

The EEG study published by Elmer et al. (n.d.), based on Kurthen (2014), aimed to determine to what extent the duration of VOT influences the N1 ERP component source localization. For a better understanding of this study, some basic phonetic and neurophysiological notions are first briefly exposed.

13.4.2.1

Basic notions

Voice onset time (VOT): VOT is defined as the time between the release of a stop consonant and the onset of vocal fold vibration. In a consonant-vowel (CV) syllable like /ta/ for example, VOT is the time between the release burst of /t/ to the onset of the vowel /a/. VOT duration can range between 10 and 150 ms (without considering prevoicing), depending on the language, rate of speech, and stop consonant (Allen et al., 2003; Kessinger & Blumstein, 1997). It is well known that voiced consonants have shorter VOT values than do voiceless consonants (at least in English), and that listeners rely on this difference to identify consonants as voiced or voiceless (Miller & Volaitis, 1989). Hypothesis ‘Asymmetric Sampling in Time’ (AST): According to the recent neurobiological framework called ‘Asymmetric Sampling in Time’ (AST), VOT can be considered as an example of rapidly changing speech cues. The AST hypothesis, which focusses on neural mechanisms of acoustic processing, differentiates rapidly changing acoustic cues (i.e., short VOTs or formant transitions) which are integrated into short temporal chunks from slowly changing cues (i.e., word stress or intonation contour) processed in long temporal integration windows (Poeppel, 2003). The AST hypothesis also assumes distinct hemispheric preferences for these different temporal resolutions: the left auditory-related areas preferentially extract information over short temporal integration windows (i.e., 25 ms) and the right auditory areas over longer integration windows (i.e., 250 ms). It should be noted that the AST hypothesis assumes a continuum of preferences ranging from rapid (left hemisphere) to slow (right hemisphere) modulations which means that basically both the left and the right auditory-related cortices accommodate neural ensembles that are proficient at decoding acoustic changes at different temporal levels. N1 ERP component: As mentioned in Section 13.3.1, several ERPs (e.g., N1, N2, P3) have been described in terms of their functional correlates. The N1 ERP component is the ERP under interest in the study conducted by Elmer et al. (n.d.), because it occurs around 100–160 ms after stimulus onset and reflects basic auditory analysis in speech processing (Meyer et al., 2006; Näätänen & Picton, 1987). Regarding localization, source estimations have shown that the source of N1 ERP was located in left temporal areas, more specifically, in the left supratemporal plane (Godey et al., 2001; Liem et al., 2012).

13.4.2.2

EEG experiment

Background: Within the framework of the AST hypothesis, Elmer et al. (n.d.) used the experiment designed by Kurthen (2014) to assess the functional lateralization of the acoustic processing as a function of VOT duration. In accordance with the AST hypothesis, they predicted stronger left hemispheric neural responses in auditory-related areas for shorter VOTs as compared to longer VOTs. Method: German-speaking Swiss participants performed an AX discrimination task, while the EEG data were recorded with a Biosemi 128 channel system. They listened to two syllables 209

Sandra Schwab and Nathalie Giroud

and had to indicate whether they were the same or different by clicking on the corresponding (left/right) mouse button. The material consisted of seven acoustically manipulated German /da//ta/ syllables with a VOT ranging between 10 and 70 milliseconds (ms) with an interval of 10 ms. From these stimuli, only three were target stimuli, namely those with 10 ms, 30 ms, and 70 ms VOT durations. Each target stimulus was followed by a stimulus with the same or a different VOT duration (i.e., VOT difference of at least 20 ms) with an interstimulus interval (ISI) of 1,000 ms. Half the trials were the same (N = 54 for each target stimulus) and the other half were different (N = 54 for each target stimulus). Data analysis: As a quick reminder, brain signals (i.e., ERPs) are recorded as waves that show voltage change over time. In the present study, the ERPs were taken from the onset of the first stimulus of each trial. First, components that were not associated with brain activity (e.g., eye blinks) were removed from the signal. Then, standard methods were applied to determine the temporal window to use to study N1 ERP (i.e., the ERP under interest in this study) and to identify the electrode showing most activity for this component. Finally, for each wave (10 ms VOT, 30 ms VOT, 70 ms VOT), the N1 peak amplitude and latency were extracted in the determined time window on the identified electrode (see Figure 13.1). Moreover, besides extracting the amplitude and latency of the waves, the authors also localized in the brain the sources of the N1 activation (using standard methods like sLORETA). Then, they extracted the amount of electric current (i.e., brain activity) from each identified brain source for each stimulus type (10 ms VOT, 30 ms VOT, 70 ms VOT). Main results: ERP’s for each stimulus type (10, 30, 70 ms VOT duration) are shown in Figure 13.1. Regarding N1 amplitude and latency, results showed that the former did not vary as a function of VOT duration. However, the latter was larger for 10 ms VOT than for 30 and 70 ms VOT.

­Figure  13.1

ERPs for VOT of 10 ms (black), 30 ms (red), and 70 ms (blue) at electrode Cz (i.e., located at the centre of the skull).

210

Analysing speech perception

Regarding source localization, results showed brain activation in the superior and middle temporal gyrus for the three stimulus types (10, 30, 70 ms VOT). Interestingly, results revealed an increased activity in the left hemisphere for the 10 ms and 30 ms VOT stimuli as compared to the 70 ms VOT stimuli. The stronger left hemispheric neural responses for shorter VOT duration confirmed the predictions of the AST hypothesis.

13.4.2.3 

Main methodological ­take-home ​­ message

The study conducted by Elmer et al. (n.d.) provides a typical example of the use of EEG methods in the study of speech perception. On the one hand, the research focusses on N1 ERP, which is the ERP reflecting auditory processing, and on the other hand, it provides information not only about the amplitude and latency of the ERP but also about its source localization in the brain.

13.4.3

L2 lexical stress perception: Combining neuroimaging and behavioural methods

In their pre/post-training study, Schwab et al. (2022) combined neuroimaging (fMRI) and behavioural methods to examine the perception of stress contrasts in a foreign/second language (L2). In some languages, like English or Spanish, words that are composed of the same sounds can have ­ different stress patterns, and hence different meanings. For example, the English words ‘import’ ­ and ‘import’, that comprise the same sounds, show different stress patterns (i.e., the former has stress on the first syllable while the latter has stress on the final syllable), and consequently, have different meanings (i.e., ‘importation’ and ‘to bring in’, respectively). The discrimination of lexical stress contrasts in a second language (e.g., Spanish número versus numero) constitutes a complex task for French-speaking listeners. Since stress contrasts do not exist in their native language, French speakers are not used to process stress contrasts and experience difficulties in doing so in a second language. Schwab et al. (2022) aimed to investigate to what extent the listeners’ ability to process L2 stress was related to the amount of neural activation in brain regions specifically involved in L2 stress perception. French-speaking participants with no knowledge of Spanish took part in an fMRI study in Spanish, as well as in behavioural tests, also in Spanish. For the sake of simplification, we will only present the fMRI study and the pre-training behavioural task, leaving apart the description of the training experiment and the post-training task.

13.4.3.1

fMRI study: Discrimination task in the fMRI scanner

Method: In the fMRI experiment (hence, inside the fMRI scanner), participants performed an AX discrimination task. They were presented with pairs of auditory trisyllabic Spanish words produced by a female native speaker of Castilian Spanish. Half of the stimuli were the same and the other half were different. The ‘same’ pairs were composed of different tokens (i.e., recordings) of the same word. In the ‘different’ pairs, items differed either in the final vowel (e.g., valoro versus valore) or in the position of word stress (e.g., valoro versus valoró). The interstimulus interval (ISI) was 500 ms and the time between two pairs was 2 seconds. The two conditions (‘vowel’ and ‘stress’) were presented in alternating blocks (i.e., block design) separated by 8 seconds (to allow the recording of brain activation in terms of BOLD responses). Participants heard the stimuli through earphones (placed in their ears) and wore an anti-noise helmet to attenuate the effect of the scanner noise. After participants heard each word pair, they were asked to indicate whether the 211

Sandra Schwab and Nathalie Giroud

two words were the same or different by clicking on the corresponding button in a response box that they were holding in their hand. Data analysis: The ‘vowel’ condition served as a control condition to be compared with the experimental ‘stress’ condition to identify the specific regions activated during stress processing. Standard methods were first used to remove signal components that were not associated with brain activity (i.e., movement artefacts). Then, for each participant, the voxels of the brain (i.e., 3D pixels of the brain) with more activation for ‘stress’ than ‘vowel’ condition were identified, which allowed the identification of the brain areas that showed more activation for ‘stress’ than for ‘vowel’ processing. Main result: As can be seen in Figure 13.2, the regions with more activation for ‘stress’ than ‘vowel’ processing were found in the bilateral inferior frontal gyrus (IFG) and in right middle/ superior temporal gyrus.

13.4.3.2

Behavioural task: Identification

Method: In the identification task (performed outside the fMRI scanner), participants listened to triplets of trisyllabic Spanish words which only differed in lexical stress: words with stress on the

­Figure  13.2

Sagittal (top left), coronal (top right), and axial (bottom) views of the brain regions with significant neural activation differences between ‘stress’ and ‘vowel’ conditions. These regions correspond to left and right inferior frontal gyrus and right middle/superior temporal gyrus.

212

Analysing speech perception

first syllable (e.g., vínculo, en. link), on the second syllable (e.g., vinculo, en. I link) and on the third syllable (e.g., vinculó, en. he linked). All words were produced by two Spanish female speakers with a falling and a rising intonation. After hearing each stimulus, participants were asked to indicate which syllable was stressed by clicking on one of the three response labels displayed on the computer screen (‘First’, ‘Second’, ‘Third’). Data analysis: The proportion of correct identifications was calculated for each participant. Main result: The proportion of correct identifications was 64.10% with a certain amount of interindividual variability (standard deviation = 14.54).

13.4.3.3

Relationship between fMRI and behavioural data

Method: The main objective aimed by Schwab et al. (2022) was to determine whether the interindividual variability observed in the listeners’ ability to identify L2 stress was related to their neural activation while processing L2 stress. For that, the authors examined the relationship between the participants’ amount of neural activation in the specific identified regions of interest and their score at the stress identification task. Data analysis: Pearson correlation analyses were run between neural activation in the specific regions of interest and the proportion of correct identifications. Main results: A marginal negative correlation observed between the listeners’ proportion of correct identifications and the neural activation in the left IFG showed that the larger the difficulties in identifying stress position in the behavioural task, the larger the activation for stress processing in left IFG (i.e., the more difficult stress processing). This relationship could not be observed for the right IFG nor for the right middle/superior temporal gyrus. Taken together, results showed that the interindividual differences observed in L2 stress processing might be (at least partially) related to neural interindividual differences.

13.4.3.4 

Main methodological ­take-home ​­ message

The study designed by Schwab et al. (2022) illustrates the use of experimental methods in the investigation of suprasegmental features (i.e., lexical stress). It first exemplifies the combination of fMRI and behavioural methods within the same experiment: fMRI images were acquired while the participants performed a discrimination task when lying in the fMRI scanner. Second, their research illustrates, like Cebrian et al. (2021), the use of several methods in different experiments, this time fMRI and behavioural methods, which allows the investigation of possible relations between the amount of brain activation and behavioural processing.

13.5

Future directions

Several unsolved issues could be the object of future research. For example, there is no consensus about the nature of the perceptual processes (i.e., acoustic, phonetic; Pisoni, 1973) involved in identification and discrimination. On the one hand, identification tasks are assumed to involve a rather phonetic processing (i.e., labelling), since they encourage phonetic classification, and thus force listeners to use their phonetic memory to label the stimuli they listened to. Nevertheless, identification tasks might involve acoustic processing in some particular circumstances, for example, with ab initio learners (i.e., learners without knowledge of the foreign language: see Schwab & Dellwo, 2022 for an example). On the other hand, discrimination tasks are considered to involve a rather acoustic processing. They generate comparisons between two (or more) 213

Sandra Schwab and Nathalie Giroud

auditory stimuli, and thus make the listeners rely on their sensory memory. However, the nature of the perceptual processing involved in the discrimination task can heavily depend on the manipulation of the interstimulus interval (ISI), type of design (i.e., AX, ABX) or cognitive memory load involved in the task (Gerrits & Schouten, 2004; Hossain, 2018). Then, at a more methodological level, for studies using two-alternative forced-choice tasks (e.g., identification with two response options or discrimination tasks), results expressed in proportions should be compared to results (from the same data) expressed in sensitivity measures like d’. Sensitivity measures need to be used not only when participants might be biased towards one response (e.g., preference for answering ‘yes’ rather than ‘no’) but also with unbalanced datasets (e.g., not the same number of same/different stimuli in discrimination tasks). Research systematically comparing proportions with sensitivity measures are, thus, necessary (see Dellwo & Schwab, 2016 for such comparison). Furthermore, a large body of research has been (still is) considering speech perception in the broader framework of spoken language processing, namely in relationship with word recognition and lexical access, with a special focus on how acoustic-phonetic information is used to access lexical items in long-term memory. Although it is well known that fine-grained phonetic details (i.e., VOT) as well as suprasegmental cues (i.e., lexical stress) influence lexical access, further studies are still needed to better understand the inter-relationship between speech perception and word recognition, at the behavioural as well as neural level. Finally, this chapter has provided examples of behavioural and neurophysiological methods used with young healthy adults (whether in L1 or L2). These methods can also be employed with other types of populations, like elderly participants, to better understand the age-related decline in speech perception for example (e.g., Giroud et al., 2019).

Acknowledgements The authors would like to thank Joaquim Llisterri for his precious advice during the manuscript preparation, and Nathalie Dherbey Chapuis for her careful reading and helpful comments about a previous version of the manuscript.

Further reading Abdi, H. (2007). Signal detection theory (SDT). In N. Salkind, (Ed.), Encyclopedia of Measurement and ­­  ­886–889). ​­ Statistics (pp. Sage. Knight, R. A., & Hawkins, S. (2013). Research methods in speech perception. In M. J. Jones & R. A. Knight ­ (Eds.), The Bloomsbury Companion to Phonetics (pp. Bloomsbury. ­­  ­21–38). ​­ McGuire, G. (2021). Perceptual phonetic experimentation. In M. J. Ball (Ed.), Manual of Clinical Phonetics (pp. Routledge. ­­  ­495–506). ​­ Michel, C. M., Koenig, T., & Brandeis, D. (2009). Electrical neuroimaging in the time domain. In C. M. Michel, T. Koenig, D. Brandeis, L. R. R. Gianotti & J. Wackermann (Eds.), Electrical Neuroimaging (pp. Cambridge University Press. ­­  ­111–143). ​­

Related topics New directions in statistical analysis for experimental linguistics; historical perspectives on the use of experimental methods in linguistics; contrasting online and offline measures: examples from experimental research on linguistic relativity; controlling social factors in experimental linguistics; analysing language using brain imaging

214

Analysing speech perception

References Allen, J. S., Miller, J. L., & DeSteno, D. (2003). Individual talker differences in voice-onset-time, Journal of the Acoustical Society of America, 113(1), https://doi.org/10.1121/1.1528172. ­ ­544–552, ​­ ­ ­ ­ Carlet, A., & Cebrian, J. (2022). ‘The roles of task, segment type and attention in L2 perceptual training’, Applied Phsycholinguistics, 43(2), ­ ­271–299. ​­ Cebrian, J., Gorba, C., & Gavaldà, N. (2021). When the easy becomes difficult: Factors affecting the acquisition of the English /iː/-/ɪ/ Contrast. Frontiers in Communication, 6. https://doi.org/10.3389/fcomm.2021.660917. ­ ­ ­ Colantoni, L., Steele, J., & Escudero, P. (2015). Research methodology. In L. Colantoni, J. Steele & P. Escudero (Eds.), Second Language Speech: Theory and Practice (pp. Cambridge University Press. ­ ­­  ­75–123). ​­ https://doi.org/:10.1017/CBO9781139087636. ­ ­ Dellwo, V. (2008). The role of speech rate in perceiving speech rhythm, in P. A. Barbosa, S. Madureira & ­­  ­375–378). ​­ C. Reis, Proceedings of 2008 4th International Conference on Speech Prosody (pp. Brazil: Campinas. Dellwo, V., & Schwab, S. (2016). The importance of using SDT measures in speaker identification tasks, [Talk presented at Phonetics group meeting], 2016, University of Zurich, Zurich. Elmer, S., Meyer, M., & Giroud, N. (n.d.). Multidimensional characterization of the neurocognitive architecture underlying age-related temporal speech deficits.[ In revision. NeuroImage]. University of Zurich. Fouz-González, J., & Mompean, J. A. (2020). Exploring the potential of phonetic symbols and keywords as ­ ­297–328. ​­ labels for perceptual training, Studies in Second Language Acquisition, 43(2), Fujisaki, H., & Kawashima, T. (1970). Some experiments on speech perception and a model for the perceptual mechanism: Annual Report of the Engineering Research Institute, Faculty of Engineering, University of Tokyo. Gerrits, E., & Schouten, M. E. H. (2004). Categorical perception depends on the discrimination task, Perception & Psychophysics, 66(3), ­ ­363–376. ​­ Giroud, N., Keller, M., Hirsiger, S., Dellwo, V., & Meyer, M. (2019). Bridging the brain structure -brain function gap in prosodic speech processing in older adults, Neurobiology of Aging, 80, ­116–126. ​­ https://doi. ­ org/10.1016/j.neurobiolaging.2019.04.017. ­ ­ Godey, B., Schwartz, D., de Graaf, J. B., Chauvel, P., & Liegeois-Chauvel, C. (2001). Neuromagnetic source localization of auditory evoked fields and intracerebral evoked potentials: A comparison of data in the same patients, Clinical Neurophysiology, 112, ­1850–1859. ​­ Hall, D. A., Haggard, M. P., Akeroyd, M. A., Palmer, A. R., Summerfield, A. Q., Elliott, M. R., Gurney, E. M., & Bowtell, R. W. (1999). Sparse’ temporal sampling in auditory fMRI, Human Brain Mapping, 7, ­213–223. ​­ Hossain, I. (2018). Percepción del acento léxico en español como lengua extranjera, [Doctoral thesis, Pontificia Universidad Católica de Chile], Pontificia Universidad Católica de Chile. Iverson, P., Pinet, M., & Evans, B. G. (2012). Auditory training for experienced and inexperienced second­ language learners: Native French speakers learning English vowels, Applied Psycholinguistics, 33(1), ­145–160. ​­ ­ ­ https:///doi.org:10.1017/S0142716411000300. Jäncke, L. (2005). Methoden der Bildgebung in der Psychologie und den kognitiven Neurowissenschaften. Kohlhammer. Kessinger, R. H., Blumstein, S. E. (1997). Effects of speaking rate on voice-onset time in Thai, French, and English, Journal of Phonetics, 25(2), https://doi.org/10.1006/jpho.1996.0039. ­ ­143–168. ​­ ­ ­ ­ Knight, R. A., & Hawkins, S. (2013). Research methods in speech perception. In M. J. Jones & R. A. Knight (Eds.), The Bloomsbury Companion to Phonetics (pp. Bloomsbury. https://doi. ­­  ­21–38). ​­ ­ org/10.5040/9781472541895. ­ ­ Kronrod, Y., Coppess, E., & Feldman, N. H. (2016). A unified account of categorical effects in phonetic perception, Psychonomic Bulletin & Review, 23, ­1681–1712. https://doi.org/10.3758/s13423-016-1049-y. ​­ ­ ­ ­­ ­​­­ ­​­­ ​­ Kurthen, I. (2014). Neural processing of rapidly changing cues in the speech signal, [Master Thesis], University of Zurich. Liem, F., Zaehle, T., Burkhard, A., Jäncke, L., & Meyer, M. (2012). Cortical thickness of supratemporal plane predicts auditory N1 amplitude, NeuroReport, 23, ­1026–1030. ​­ https://doi.org/10.1097/WNR. ­ ­ ­ 0b013e32835abc5c. Luck, S. J. (2005). An Introduction to the Event-Related Potential Technique. MIT Press. Macmillan, N. A., & Creelman, C. D. (1991). Detection Theory: A User’s Guide. Cambridge University Press.

215

Sandra Schwab and Nathalie Giroud Matthys, S. L. (2013). Speech perception. In D. Reisberg (Ed.), The Oxford Handbook of Cognitive Psychology. Oxford University Press. https://doi.org/10.1093/oxfordhb/9780195376746.013.0026. ­ ­ ­ ­ McGuire, G. (2021). Perceptual phonetic experimentation. In M. J. Ball (Ed.), Manual of Clinical Phonetics (pp. Routledge, London. https://doi.org/10.4324/9780429320903. ­­  ­495–506). ​­ ­ ­ ­ McQueen, J. M. (1996). Phonetic categorisation. In F. Grosjean & U. Frauenfelder (Eds.), A Guide to Spoken ­­  ­655–664). ​­ Word Recognition Paradigms (pp. Psychology Press. Meyer, M., Baumann, S., & Jäncke, L., (2006). Electrical brain imaging reveals spatio-temporal dynam­ ​­ ­ ­ ­ ics of timbre perception in humans, Neuroimage, 32, 1510–1523. https://doi.org/10.1016/j.neuroimage. 2006.04.193. Michel, C. M., Koenig, T., & Brandeis, D. (2009). Electrical neuroimaging in the time domain. In C. M. Michel, T. Koenig, D. Brandeis, L. R. R. Gianotti & J. Wackermann (Eds.), Electrical Neuroimaging ­­  ­111–143). ​­ (pp. Cambridge University Press. ­­ ​­ Miller, J. L. (1994). On the internal structure of phonetic categories: A Progress report, Cognition, 50(1–3), ­271–285. ​­ Miller, J. L., & Volaitis, L. E. (1989). Effect of speaking rate on the perceptual structure of a phonetic category, ­ ­505–512. ​­ Perception & Psychophysics, 46(6), Munro, M. J. (2008). Foreign accent and speech intelligibility. In J. G. Hansen Edwards & M. L. Zampini ­ (Eds.), Phonology and Second Language Acquisition (pp. 193–218). John Benjamins Publishing Company. Näätänen, R., & Picton, T. (1987). The N1 wave of the human electric and magnetic response to sound: A ​­ review and an analysis of the component structure, Psychophysiology, 24, ­375–425. Ogawa, S., Lee, T. M., Kay, A. R., & Tank, D. W. (1990). Brain magnetic resonance imaging with contrast dependent on blood oxygenation, PNAS, 87, ­9868–9872. ​­ Pisoni, D. B. (1973). Auditory and phonetic memory codes in the discrimination of consonants and vowels, Perception & Psychophysics, 13, ­253–260. ​­ Pisoni, D. B. (1981). Some current theoretical issues in speech perception’, Cognition, 10(1–3). ­­ ​­ ­249–259. ​­ https://doi.org/10.1016/0010-0277(81)90054-8. ­ ­ ­­ ​­ ­ ­­ ​­ Poeppel, D. (2003). The analysis of speech in different temporal integration windows: Cerebral lateralization as ‘asymmetric sampling in time, Speech Communication, 41, ­245–255. https://doi.org/10.1016/S0167​­ ­ ­ ­­ ​­6393(02)00107-3. ­ ­­ ​­ Prieto, P. (2012). Experimental methods and paradigms for prosodic analysis. In A. C. Cohn, C. Fougeron & ­­  ­528–538). ​­ M. K. Huffman (Eds.), The Oxford Handbook of Laboratory Phonology (pp. Oxford University Press. https://doi.org/10.1093/oxfordhb/9780199575039.013.0019. ­ ­ ­ ­ Schwab, S., & Dellwo, V. (2022). Explicit versus non-explicit prosodic training in the learning of Spanish L2 stress contrasts by French listeners. Journal of Second Language Studies, 5(2), ­ ­266–306. ​­ Schwab, S., Mouthon, M., Salvadori, S., Ferreira da Silva, E., Yakoub, I., Giroud, N., & Annoni, J. M. (2022). Neural correlates and L2 lexical stress learning: An fMRI study, Proceedings of the 2022 11th International Conference on Speech Prosody, Lisbon, Portugal. Zaehle, T., Schmidt, C. F., Meyer, M., Baumann, S., Baltes, C., Boesiger, P., & Jäncke, L. (2007). Comparison of ‘silent’ clustered and sparse temporal fMRI acquisitions in tonal and speech perception tasks, Neuroimage, 37, 1195–1204. ­ ​­

216

14 CONTRASTING ONLINE AND OFFLINE MEASURES Examples from experimental research on linguistic relativity Sayaka Sato and Norbert Vanek 14.1

Introduction and definitions

Researchers in linguistics have increasingly employed experimental methods to study questions pertaining to language phenomena. This surging interest derives from the fact that data collected from empirical experiments, unlike methods such as introspection and intuition, allow the researcher to actively control for confounding variables and examine the latent mechanisms as they unfold in real time. Making use of such an approach allows specific linguistic behaviour and the assumed underlying mechanism to be directly linked to each other. This is crucial, as language cannot be directly observed, and indirect measures that allow us to test our assumptions objectively are key to our investigations. In light of this context, this chapter aims to provide guidance for novel as well as more seasoned researchers in understanding the fundamental characteristics, complementary strengths, and uses of two conventionally employed measures in experimental linguistics: online and offline measures. Online measures refer to measures that tap into the moment-by-moment processes that occur in real time, allowing researchers to investigate the time course in which specific processes emerge. In contrast, offline measures are related to the outcome or representations that are constructed after processing has occurred, revealing how specific information was processed or interpreted. Each has its characteristics, but in many cases, they are both implemented within a single study in a complementary manner. Here, we showcase a range of more and less established online measures, including response times1, eye movement measures, skin conductance responses (SCR), eventrelated potentials (ERP), and functional magnetic resonance imaging (fMRI), alongside representative offline measures such as categorisation preferences, accuracy scores, similarity ratings, corpus-based word embeddings, and measures outside the lab. We focus on measures and illustrate their relevance and application across methods. We tie individual measures to specific research questions and show how they are implemented in experiments. To better illustrate the concepts introduced in this chapter, we provide example experiments that illustrate a variation of measures within one domain, namely linguistic relativity, the notion that the languages we speak impact our cognition in predictable ways (Whorf, 1956). We find this approach particularly informative on two levels. Not only does research on linguistic relativity 217

DOI: 10.4324/9781003392972-17

Sayaka Sato and Norbert Vanek

boast a variety of measures to help scholars focus on different aspects of sensory and linguistic processing, but exploring studies in this domain also invites a critical comparison between the characteristics each measure offers.

14.2

Online measures

14.2.1 14.2.1.1

Response times Characteristics

One of the most basic and widely used behavioural measures in psycholinguistic research is response time, defined as the time taken from stimulus onset until the participant’s response is initiated and measured. Response times are generally noisy, influenced by various factors relating to the participant (e.g., fatigue, motivation), experimental context (e.g., distractions, temperature), or the experimental stimuli (e.g., familiarity, frequency) (Baayen & Milin, 2010), yet are generally taken as an observable measure that reflects the cognitive effort needed to process the experimentally manipulated variable. In a typical experimental setup, response times are registered through an input device (e.g., mouse click, keypress) immediately after the stimulus is delivered through one or more sensory modalities (e.g., visually, auditorily). Responses are produced as quickly and accurately as possible to obtain a precise measure of the processes being tested. In most cases, longer response times indicate greater difficulty or complexity in processing the input, whereas shorter response times denote facilitation in processing.

14.2.1.2

Strengths and limitations

The merits of response times as a measure stem partly from their convenience to easily and quickly test numerous participants at a relatively low cost that nonetheless captures the cognitive processes as they unfold online. More importantly, response times are flexible and can be accommodated to test virtually any kind of cognitive processes as long as the paradigm allows for it. Potentially, having a simple computer or laptop would suffice to run an entire study and collect informative data. The complexity in treating response times, however, lies in the difficulty in their interpretation. For instance, participants may purposefully respond slower to an experimental task to increase accuracy or speed up and sacrifice response accuracy (i.e., speed-accuracy trade-off, Schouten & Bekker, 1967; Wickelgren, 1977). Response times may, therefore, contain the influences of multiple cognitive processes in one value, which can lead to misinterpretations of what they actually reflect. Consequently, it becomes difficult to disentangle and isolate the process of interest.

14.2.1.3 Application across methods and experiments Given their flexibility, response times can be collected in combination with other experimental methods such as eye-tracking (Hebert et al., 2021; von Stutterheim et al., 2012), ERP (e.g., Boutonnet & Lupyan, 2015; Ellis et al., 2015; Sato et al., 2020), and fMRI (Francken et al., 2015; Siok et al., 2009). However, they can also provide informative evidence simply on their own. For instance, Gilbert et al. (2006) presented English speakers with a circle of blue and green squares and found that response times to discriminate the colours were faster when participants saw squares with different names (i.e., blue and green) in their right visual field, although response times were not influenced by the different names in the left visual field. The authors reasoned that because 218

Contrasting online and offline measures

blue and green belong to separate lexical categories, perceptual differences were enhanced particularly in the right visual field as language processing is more strongly represented in the left hemisphere (Hellige, 1993). In a more recent study examining the impact of grammatical gender categories, Sato and Athanasopoulos (2018) presented French-English bilinguals and monolingual English speakers with picture pairs of gender-laden objects (e.g., necktie) and instructed participants to decide whether the objects made them think of the picture of either a male or female face presented immediately after. They found that response times were slower for the bilingual participants when the grammatical gender category of the object name in French mismatched the face gender. Given that these results emerged solely for the bilinguals, grammatical information was considered unconsciously activated and recruited online during categorisation. Interestingly, these findings emerged even though processing language was entirely unwarranted for resolving the task, showing that when employed in appropriate paradigms, response times are sensitive to unconscious activation of linguistic information.

14.2.2

Eye movement measures

14.2.2.1

Characteristics

Eye movement measures reflect the moment-to-moment cognitive processes that unfold during information processing (Just & Carpenter, 1980). They are registered while participants are presented with either pictorial (i.e., static pictures or videos) or linguistic (i.e., sentences or texts) stimuli. Known as the visual-world paradigm, pictorial stimuli are often shown concurrently with audio stimuli, where participants observe pictures that are coherent with the contents of the auditory stimuli along with a potential distractor. Because eye movements naturally and concurrently follow the same pattern as processed acoustic linguistic information (i.e., time-locked to stimulus) (Allopenna et al., 1998; Cooper, 1974), preferential looks towards an object picture or parts of a scene are assumed to reflect activation of the corresponding information (Altmann & Kamide, 2007). Fixation location reflecting the proportion of fixations to particular areas of interest and their development over time (Kaiser et al., 2018) are critical measures for analyses. In contrast, eye movement measures elicited from linguistic stimuli can reveal difficulty or the time course of information processing as readers may fixate, skip, or regress onto previously read words, phrases, or sentences (Rayner, 2009). Widely analysed measures include first fixation duration (the first time a reader fixates on a predefined region of interest or ROI) and first-pass reading time (total fixations from first entering the ROI until moving on), which allow the researcher to examine early reading processes such as word recognition or anticipation of syntactic structures. In contrast, regression fixation proportion (proportion of sums of fixations on an ROI) and total reading time (the total time spent on an ROI) can reveal later processes, including information integration.

14.2.2.2

Strengths and limitations

An advantage of eye movement measures is that they are generally elicited from experiments which allow participants to freely inspect experimental stimuli without requiring them to make judgments prone to conscious strategies (Valdés Kroff et al., 2018). As such, they bear a greater resemblance to natural language processing than those elicited from tasks requiring decisionbased responses and strict time constraints. This feature also enables researchers to test babies and 219

Sayaka Sato and Norbert Vanek

infants that are preliterate or too young to complete certain experimental tasks as it allows them to present non-linguistic, pictorial stimuli (Zacks, 2020) as well as signers that may process language in a modality-specific manner (Lieberman et al., 2013). Additionally, unlike other online measures such as response or reading times that elicit a single measure, multiple eye movement measures can be collected simultaneously. While this aspect allows researchers to have a layered picture regarding the different depths and moments of information processing, valid hypotheses for each obtained measure should nonetheless be decided prior to data collection.

14.2.2.3 Application across methods and experiments Given their wide applicability, eye movement measures can be collected alongside other measures such as EEG (e.g., Dimigen et al., 2012), accuracy scores and response times (e.g., Hebert et al., 2022). Eye movement measures have been widely applied to research investigating the languagethought link, notably in studies examining motion event perception. Monitoring eye movements in the motion event domain is particularly fruitful as it allows researchers to present dynamic videos and observe participants’ gaze allocations (Athanasopoulos & Casaponsa, 2020). The rationale of these studies stands on the idea that language-specific patterns, such as whether a language encodes aspect (ongoingness) for motion events, can habituate gaze or attention allocation during event perception. For example, Flecken et al. (2014) presented German and Arabic speakers with video clips of different entities (e.g., a person) moving towards a specific endpoint (e.g., playground). The authors found that, consistent with the characteristic of the spoken language, speakers of Arabic, which grammaticises verbal viewpoint aspect, fixated less on an event’s endpoints than speakers of German, which do not encode this information. This was true even though the entity had not yet reached the endpoint in the video clips. Recent research has further examined eye movements on topics assessing the impact of language on attention capture (e.g., Goller et al., 2020; Sauppe & Flecken, 2021), categorical perception (Al-Rasheed et al., 2014; Franklin et al., 2008), and object search (Hebert et al., 2021).

14.2.3

Skin conductance responses (SCR) 14.2.3.1

Characteristics

Skin conductance responses (SCR) are a psychophysiological measure. They are used to index emotional intensity of autonomic arousal based on the idea that electrodermal activity increases when emotionally charged stimuli are processed (Harris et al., 2006). Two electrodes are attached to the index and middle fingers of the participant. Participants see or hear emotionally charged (critical) stimuli intermixed with neutral (control) stimuli. The amplitude of their phasic skin conductance, i.e., the SCR, is measured by subtracting the participant’s response to neutral stimuli from the maximum score during the presentation of the emotionally charged critical stimuli (Harris et al., 2003). Two important considerations for SCR measurement are correct fastening of electrodes to fingers (neither too tight to avoid restricting blood flow, nor too loose to avoid signal loss) and allowing between 4 and 8 sec for the SCR after critical trials to return to the baseline value (Hugdahl, 2001).

14.2.3.2

Strengths and limitations

One SCR advantage is their sensitive nature, relying on a very low, imperceptible voltage administered to monitor uncontrollable electrical conductance of the skin. Researchers aiming to examine 220

Contrasting online and offline measures

unconscious physiological responses to linguistic or other sensory stimuli can benefit from the mechanism underlying SCR, namely that the autonomic nervous system increases the perspiration of the body and thereby improves the skin’s electrical conductance. Another strength is their wide-ranging applicability attested to detect the emotional punch of linguistic stimuli across languages (and within speakers of more than one language) in semantic domains with a positive charge such as endearments (e.g., Caldwell-Harris et al., 2011) or a negative charge such as taboos (e.g., Eilola & Havelka, 2011). Within the context of linguistic relativity, SCRs could be used for instance by randomly assigning bilingual speakers to rate the emotional intensity of pictures while using verbal distractors in their two languages and see if their responses differ. SCRs may be superior to ERPs in terms of their sensitivity to emotional valence of words (Dinn & Harris, 2000), but may be less preferred than ERPs when high-temporal resolution is a research priority. Another hindrance in research could be that repositories of affective norms are currently limited to words in English (Bradley & Lang, 1999) and only a handful of other languages. If a normed reference is absent, ratings for the test stimuli can be collected as part of a pre-test or while piloting the experiment.

14.2.3.3 Application across methods and experiments Various methods can and have been used alongside SCR, such as a Stroop task (Eilola & Havelka, 2011), emotionality ratings (Caldwell-Harris et al., 2011; Vanek & Tovalovich, 2021), self-reports using emotion-laden narratives (Jankowiak & Korpal, 2018) or a surprise memory task with facial motor resonance (Baumeister et al., 2017). SCR can be a beneficial contribution to studies interested in the relative emotional power of words or phrases within as well as across languages. They can address emotionality-focused research questions related but not limited to linguistic relativity. Two areas where past experiments invite methodological considerations are matching stimulus characteristics and control over their contextual richness. While length and frequency control of verbal stimuli is particularly important for response time-based analyses relying on processing speed (e.g., Larsen et al., 2008), designs with SCR may allow more freedom in this respect since the measured variable is the peak SCR value in an often generously defined multi-second time window. And in terms of context, it is not rare to see previous SCR-based studies combining single emotion-laden words and more complex emotion-laden phrases (e.g., Harris et al., 2003). Interpretations could be problematised for such mixed stimuli because phrases make contextually richer stimuli than single words do. To accelerate progress in the field, future studies measuring SCR may find it advantageous to compare the same emotion-laden stimuli with and without contextual embedding.

14.2.4 ­Event-related ​­ potentials (ERP) ­ 14.2.4.1

Characteristics

Event-related potentials (ERP) are fluctuations in the brain voltage in response to a sensory stimulus measured at different scalp regions. They are derived by averaging multiple epochs (i.e., segments) of continuous EEG (electroencephalographic) measurements time-locked to a sensory stimulus of interest and are considered to reflect the neural activity of information processing. Sets of voltage changes that reflect specific neural processes, known as ERP components, can then be identified and examined in order to deduce specific cognitive mechanisms (Luck, 2014). These 221

Sayaka Sato and Norbert Vanek

ERP components are generally characterised by their latency (timing), polarity (positive or negative), amplitude, and scalp distribution, and can be divided into exogenous and endogenous components. Exogenous components refer to early ERP waves peaking around 100–200 ms, which are involuntary and related to the physical characteristics of the presented stimulus (e.g., intensity). In contrast, endogenous components refer to later waves that emerge from participants’ interaction with the stimuli (e.g., attention) (Sutton et al., 1965) and stem from the processes associated with the task. Commonly, the amplitude and latency of the peak voltage are taken as measures to quantify the timing and magnitude of an identified ERP component. In other words, a time window of interest is determined and the most positive or negative peak (i.e., amplitude) and the timing (i.e., latency) of the component in the conditions of interest are compared.

14.2.4.2

Strengths and limitations

Recording continuous EEG to obtain ERP waves has multiple advantages, especially within the context of linguistic relativity research. Unlike other non-invasive brain imaging techniques (e.g., fMRI), ERP boasts exceptional temporal resolution. This feature provides means to examine cognitive functions as they unfold in real time and can shed light on early cognitive processes (e.g., visual perception) that cannot necessarily be addressed with behavioural measures. Note that behavioural measures such as response times can only reveal the end result of a myriad of ongoing processes (e.g., stimulus recognition, decision making, button press) instead of the different stages of processing as they occur. Finally, as recording EEG is physically less constraining for participants compared to other neuro-imaging systems, it can be used in combination with other data acquisition methods such as eye-tracking or fNIRS (functional near-infrared spectroscopy), offering the flexibility of using different experimental paradigms. Nonetheless, ERP has its shortcomings, with its poor spatial resolution being the most characteristic. As ERP components are produced by multiple generators dispersed across the scalp, it does not provide insight into which area of the brain generated the particular activity. Thus, derived topographical information about the ERP may not reflect the neural sources of the corresponding component.

14.2.4.3 Application across methods and experiments ERPs arguably afford the most efficient approach in testing language effects on nonverbal perception as they allow the detection of early and unconscious effects of mental processes prior to language access (Thierry, 2016). For example, studies have examined visual mismatch negativity (vMMN; Czigler, 2014), a negative deflection peaking at 100–250 milliseconds (ms) considered to reflect preattentive detection of stimulus change. The logic is that if language-specific terminology for specific concepts can reliably influence perception, speakers of languages with this differentiation should exhibit greater vMMN deviations than speakers without this distinction. Classic examples of the vMMN effect come from the domain of colour perception (e.g., Mo et al., 2011; Thierry et al., 2009; Xia et al., 2019), in which speakers of Greek, who have distinct terminology to discriminate darker [ble] and lighter [ghalazio] shades of blue, exhibit enhanced vMMN deflections indicating that they discern the critical hue better than English speakers who do not have comparable terms in their language. Recent research examining vMMN effects has extended investigations to other perceptual domains such as object recognition (Jouravlev et al., 2018; Pan & Jared, 2021), face recognition (Yu et al., 2017), and capture of visual attention (Goller et al., 2020).

222

Contrasting online and offline measures

14.2.5

Functional magnetic resonance imaging (fMRI) 14.2.5.1

Characteristics

Functional magnetic resonance imaging signals the activity of neurons in different parts of the brain. This measure serves as an index to show which brain regions are implicated in the current processing of linguistic, or other, perceptual stimuli. An fMRI scanner maps neuronal activation by displaying changes in the degree of blood oxygenation across brain regions. More active neurons need higher oxygen concentration, which enables experimenters to monitor which parts of the brain are more and which are less stimulated. Higher activation when processing various aspects of language, including semantics, syntax, and phonology, has regularly been observed in the left perisylvian cortex, known as the general brain network underlying language function (Friederici, 2011). Attempts to further localise language processing regions point to the engagement of the left posterior temporal gyrus and the left inferior frontal gyrus, showing activation when complex syntax or semantic violations are processed. An important difference exists between fMRI, which measures neural activity based on haemoglobin oxygenation, and MRI, which measures the anatomical structure of the brain based on voxels inferred to quantify grey and white matter volume.

14.2.5.2

Strengths and limitations

The main strength of fMRI is its precision to localise which brain area is activated during the processing of perceptual stimuli (e.g., a picture of a leg), and whether this activation also occurs when processing associated linguistic stimuli (e.g., the verb ‘kick’) (Hauk et al., 2004). Benefits also include the possibility to test predictions whether cortical activity in a particular brain locus gets orchestrated as participants learn to bind linguistic and perceptual information (Pulvermüller, 2018). Besides the capacity to establish functional distinctions between various language areas, fMRI studies report a clear hemispheric dominance in language processing, left-sided in 95% of right-handed and in 70% of left-handed participants (Lurito & Dzemidzic, 2001). The main limitation of this measure is that while fMRI can identify the source of activation in the brain, its temporal resolution is poor, especially when compared to ERP. Moreover, running fMRI studies is costly and neurolinguistic research is yet to identify critical areas for particular aspects of language processing, as there is substantial variation in what counts as neuroanatomic substrates of language across studies (from 100% to as low as 22%, Smits et al., 2006).

14.2.5.3 Application across methods and experiments Here we briefly survey two fMRI studies linked to linguistic relativity, one combined with accuracy scores to monitor changes in brain connectivity as a result of perceptual learning (Schmidt et al., 2019), and another combined with response times to test whether the locus of languageperception interactions is stronger in the language-dominant left hemisphere (Francken et al., 2015). Schmidt et al. (2019) examined the neuronal mechanisms underlying the implicit formation of links between verbal labels and tactile stimuli that were difficult to distinguish and name. After 5 days of extensive training to associate heard pseudowords with a range of Braille-like vibrotactile patterns, fMRI data collected before and after the training showed changes in brain connectivity that pointed to a language-driven emergence of auditory-somatosensory neuronal

223

Sayaka Sato and Norbert Vanek

networks. These fMRI results were interpreted as a physiological correlate of how linguistic labelling can influence the ways in which we perceive sensory input. Francken et al. (2015) examined whether language affects the neural locus of motion perception by comparing fMRI data from a visual motion detection task. Participants saw a motion stimulus in the right or the left visual field, preceded by a congruent or incongruent word such as ‘descend’. A congruency advantage and a greater neuronal activation of language-related areas (left middle temporal gyrus) emerged when the stimuli were presented to the right visual field. This lateralisation effect was explained as automatic modulation of motion perception by semantic information retrieved in language regions. When the aim of a study is not to examine automaticity of linguistic modulations, but, instead, to focus on behavioural effects that can arise through conscious strategy use, researchers have a range of offline measures at their disposal.

14.3 14.3.1

Offline measures

Categorisation preferences

14.3.1.1

Characteristics

Categorisation preferences are a behavioural measure showing the extent to which different entities (such as colours, events, objects, sounds or language features) are thought of as instances of the same kind. For example, scarlet and crimson perceptually differ but they can be grouped as kinds of red, thus belonging to the same category. To uncover which shared properties are important for category membership, designs often include an ABX task. In this task, observers need to indicate (usually with a button press) whether they find stimulus A or stimulus B more similar to the reference stimulus X. Categorical preferences are measured by calculating the proportion of AX versus BX groupings. If AX preferences prevail, it shows that more common properties, or more important ones, were identified between A and X than between B and X. Organising categories around the degree of overlap in properties forms the basis of prototype theory (Goldstone et al., 2013; Rosch, 1975). Prototype-based categorisation reflects selective channelling of attention to the shared perceptual attributes, following the logic that people represent items with reference to the prototype to which they are the closest. Under this view, shared perceptual features (such as the overlap in roundedness between speech sounds [ɔ] and [ʊ]) will be selectively attended to while category-irrelevant features (e.g., the difference in roundedness between [ɔ] and [ɪ]) will be ignored when the task is to categorise (if for instance [ʊ] or [ɪ] is more similar to [ɔ]).

14.3.1.2

Strengths and limitations

Among the strengths of categorical preferences is their potential to show how accurately (and/or how fast) participants identify diagnostic features underlying category membership. That is, they inform about the degree of success in abstracting prototypical representations (Margolis & Laurence, 1999). Among the limitations of categorical preferences as a measure is that, if used on their own, they may not be able to determine whether participants categorise items based on what makes them similar or if the choices rely more on how items differ. One way to address this ambiguity is to ask participants for item similarity ratings before and after categorisation and use the difference in ratings to test whether the direction of change in perceived similarity is a significant predictor of categorisation preferences.

224

Contrasting online and offline measures

14.3.1.3 Application across methods and experiments Categorisation preferences are readily combinable with a range of online methods, including ERPs (Thierry & Wu, 2007), eye-tracking (Ji & Papafragou, 2020), as well as offline methods such as memory-based forced-choice tasks (Sakarias & Flecken, 2019). Here we illustrate two example scenarios where measuring categorisation preferences can advance the agenda in linguistic relativity research. In the domain of smell, Vanek et al. (2021) examined whether odour categorisation is enhanced with greater odour-label pairing consistency. Using pseudowords as linguistic labels and measuring categorisation choices with feedback and block-to-block learning gains, participants’ ability to categorise odours (i.e., to choose if odour 1 or odour 2 is more similar to odour 3) was found significantly better when odours were more consistently paired with labels. This line of research could be extended through testing whether similar facilitation emerges in odour categorisation with consistently paired nonverbal cues (e.g., with shapes or sounds) to establish whether linguistic labels have a special status (Lupyan & Thompson-Schill, 2012) or not. In the domain of touch, Miller et al. (2018) tested how the ability to recognise sensory input that is difficult to distinguish benefits from implicit learning of new associations between linguistic labels and tactile stimuli. After five days of intensive training, significant improvement in how similar tactile patterns were perceived and recognised was only found when the patterns were consistently paired with linguistic labels. Their design was not built to differentiate whether participants formed conceptual abstractions or more shallow associations, but a categorisation follow-up would be in a good position to show whether labels help participants identify diagnostic features of tactile categories. If labelling supports more abstracted prototypical representations (Prinz, 2004), repeated exposure could lead participants to infer category-relevant tactile commonalities, which they could selectively activate with the help of verbal labels as diagnostic properties of a given category (e.g., rough) while ignoring category-irrelevant properties (e.g., smooth). As a beneficial follow-up study for the future, an ultimate test of whether participants learn a diagnostic property of a category would be if they could correctly generalise that category to include a new exemplar (e.g., a coral texture) that shares the relevant property. Absence of correct generalisations to new exemplars would indicate success in forming shallow label-percept associations rather than a new category.

14.3.2 Accuracy scores 14.3.2.1

Characteristics

Performance accuracy is a standard psycholinguistic measure that expresses the participant’s ability to discriminate whether a tested item meets the criteria pre-set for the given task (Harley, 2001). For instance, in comprehension, accuracy scores can be collected from lexical decisions (i.e., yes–no judgements) asking individuals if a string of letters is a word or not. Two types of both ‘Yes’ and ‘No’ answers can be recorded in this and similar contexts. The ‘Yes’ answers include hits and false alarms. Hits are accurate ‘Yes’ responses to existing words. They indicate whether or not a linguistic item is stored in the mental lexicon. False alarms are incorrect ‘Yes’ responses to pseudowords (i.e., pronounceable letter strings without meaning) or nonwords (i.e., unpronounceable or barely pronounceable random letter strings). They indicate the participant’s susceptibility to guess. The ‘No’ answers include correct rejections and misses. Correct rejections are similar to hits, and misses are similar to false alarms. Accuracy scores are not limited to testing comprehension. They readily extend to tests of working memory or language production, for instance, as a

225

Sayaka Sato and Norbert Vanek

measure of the capacity to recall a series of words in the correct sequence or to name pictures. Of particular interest in experimental linguistic studies with performance accuracy used as a window into conceptual representations is the understanding of how hit rates are affected by decision stage (early/perceptual versus Late/rule-based, Jacobs & Grainger, 1994), presence or absence of feedback (Kersten et al., 2010; Zhang & Vanek, 2021, respectively), or by varying degrees of linguistic involvement (Lupyan et al., 2007).

14.3.2.2

Strengths and limitations

Accuracy scores boast a vast array of uses across many research designs. They have been employed in combination with every online measure surveyed in this chapter. Although their popularity in psycholinguistics has recently been shaken in favour of online methods (Ferreira & Yang, 2019), they can serve as reliable indicators of recognition and production ability with solid theorybuilding potential. Their collection is relatively simple, and their calculation is fast. As with most measures, the explanatory power of accuracy scores in isolation is limited, possibly fragmentary, if study designs vary in elicitation constraints such as unmatched task demands or uncontrolled training effects. Score-based claims about knowledge types (e.g., implicit versus explicit) or processing mechanisms (e.g., anticipation versus integration) tend to become volatile unless performance accuracy is triangulated with more time-sensitive and less strategic measures, such as online ones (e.g., ERPs or response times). Further challenges that may problematise the link between accuracy scores and linguistic knowledge are data analyses which include outliers (hits >2.5 SDs faster or slower than the mean response speed for the given condition) and those which conflate hits and correct rejections without adjusting for high false alarm rates (>20%). Endeavours to reduce such risks are of great importance to ensure accuracy scores represent a rigorous, transparent, and replicable measure.

14.3.2.3 Application across methods and experiments Research with accuracy scores dovetails nicely with the remit of studies that aim to zero in on the development, processing, use, and erosion of linguistic- or language-modulated knowledge. Differences in accuracy scores from tasks related to language can be readily examined in covariation with individual difference factors such as working memory capacity, language group membership, or age. We briefly showcase two experiments applying accuracy scores in linguistic relativity research, one with a focus on changes in accuracy to monitor learning gains (Kersten et al., 2010) and the other testing accuracy in approximating to targets in stimulus reproduction (Dolscheid et al., 2013). Kersten et al. (2010) collected accuracy scores in five learning blocks to examine if English versus Spanish speakers’ motion event categorisation differs. English speaker’s accuracy was found higher than Spanish speakers’ in manner-based categorisation. This difference was attributed to language-specific encoding of manner, a prominent attribute in English but not Spanish. Dolscheid et al. (2013) compared Dutch and Farsi speakers’ mental representations of musical pitch in an elegant reproduction task. Accuracy was measured in terms of how closely the reproduced pitch matched the model. Pitch, typically expressed as high or low in Dutch and thin or thick in Farsi, had to be reproduced while watching distractors that changed either height or thickness. Reproduction accuracy for Dutch speakers was lower with height-based distractors and Farsi speakers were less accurate with thickness-based distractors, which was interpreted as an effect of language specificity in space-pitch metaphors.

226

Contrasting online and offline measures

14.3.3 14.3.3.1

Similarity ratings Characteristics

Similarity ratings refer to participants’ subjective ratings about the perceived similarity between two or more items. They are frequently employed to investigate the process of categorisation, a mechanism considered to be grounded in similarity, such that items belonging to the same category are perceived as being more similar to one another (Nosofsky, 1986). Similarity ratings can also provide information as to how word meanings are represented, as related words are considered to be connected or located in proximity within the semantic network (Collins & Loftus, 1975). In order to elicit similarity ratings, participants are given two or more items and, on a scale, report the extent to which they find these items alike (e.g., Athanasopoulos & Bylund, 2013; Lucy & Gaskins, 2003; Phillips & Boroditsky, 2003). In the context of linguistic relativity research, pairs of pictures or videos that share an underlying linguistic feature meaningful to the participant’s spoken language are evaluated as being more similar than pairs without that linguistic feature. For instance, Phillips and Boroditsky (2003) found that picture pairs consisting of a person and an object were perceived more similar when the gender of the person was congruent with the grammatical gender of the object, indicating that the object’s grammatical marking offered a basis for gender categorisation in a non-linguistic task.

14.3.3.2

Strengths and limitations

Measures that reflect similarity can be commonly elicited from forced-choice tasks, such as selecting from a series of choices a most similar version to the target item. Compared to such binary measures, similarity ratings that extract graded responses may, to some degree, offer a more precise picture of the investigated process (e.g., Speed et al., 2021). However, an issue to consider is the creation of experimental stimuli, which requires the experimenter to construct items that have the same perceptual distance between the images to be compared across all experimental items. In this respect, research in colour perception that allows the experimenter to adequately control perceptual distances offers a viable testbench, whereas other domains such as object categorisation may present difficulty for stimulus construction. To counter these issues, researchers may conduct rigorous pre-tests by conducting property judgment tasks that evaluate experimental stimuli for their perceptual (e.g., nameability, size, colour) or linguistic (e.g., valence, plausibility, familiarity) properties. In terms of theoretical considerations, offline measures such as similarity ratings cannot completely rebut the possibility that perceptual effects emerging in these measures arise from participants implicitly accessing language. Options to control such effects may only be possible through the application of neurophysiological methods.

14.3.3.3 Application across methods and experiments Studies frequently collect similarity ratings as a way to pretest sets of stimuli to control for category similarity (Noorman et al., 2018) or in conjunction with other online or offline experimental measures. Similarity ratings have also been used as the main experimental measure in studies examining the impact of familiar and novel labels (Suffill et al., 2019), object labels (Masuda et al., 2017), classifier (Speed et al., 2021), and grammatical gender (Phillips & Boroditsky, 2003) categories.

227

Sayaka Sato and Norbert Vanek

14.3.4 ­Corpus-based ​­ word embeddings 14.3.4.1

Characteristics

Databases consisting of a collection of real-life written or spoken language are known as corpora. In order to better understand how language is used in the real world, researchers can analyse various available databases by inspecting different linguistic patterns and trends. A corpus-based measure that has gained recent momentum, apart from word frequency, is the examination of word embeddings within a given corpus through Natural Language Processing (NLP) methods. Word embeddings assume that words which co-occur in shared contexts are likely to be semantically related. Based on this premise, researchers can make estimations about the semantic and syntactic similarity between co-occurring words taking into consideration the context in which they appear. For example, Defranza et al. (2020) examined whether gender prejudice is more prevalent in gendered than genderless languages across 45 languages. Theoretically grounded in linguistic relativity, their study reported that male words were associated more with positively valenced words than female words in gendered languages, indicating that gender prejudice was displayed more in gendered than genderless languages.

14.3.4.2

Strengths and limitations

While participant responses from experimental contexts allow researchers to control for potential confounds, they are nonetheless elicited from fully-constrained, artificial contexts, which may not necessarily reflect how language actually functions. In contrast, corpora are large, representative samples of naturally occurring data that are machine readable (Gilquin & Gries, 2009). Word embeddings thus offer the possibility to quantify and examine trends in language data as they occur in real-life settings as well as survey languages that may be difficult to access. However, they may be limited in that based on the context in which certain words occur, the model may acquire biases as specific links between words may be more nuanced.

14.3.4.3 Application across methods and experiments Combining corpus-based analyses with experimental approaches can complement the weaknesses of each approach (Gilquin & Gries, 2009). This is also true when applying word embeddings. One possible way they are complementary is by testing a linguistic phenomenon in an experimental setting and further validating its effect in corpora, or vice versa. For instance, experimental approaches may be potentially used to substantiate whether speakers of different languages may have stronger word associations between gender and valence as suggested by the aforementioned study of Defranza et al. (2020). Another frequently used approach is to employ corpora to extract experimentally adequate stimuli to be further tested in strictly controlled experimental settings.

14.3.5

Measures outside the lab

14.3.5.1

Characteristics

Many experimental linguists find it unsatisfactory that the current empirical basis against which psycholinguistic theories are assessed only come from a fraction of the world’s languages. For sentence production data, for instance, this fraction was found as low as 0.6% (Norcliffe et al., 228

Contrasting online and offline measures

2015). A welcome step to balance the disproportionate emphasis on linguistic experiments targeting just a handful of languages, and mostly testing students at Western universities (also known as the WEIRD sampling bias), is to do research in the field and take measures from diverse language groups. Measures outside the lab in real-world situations come with increased ecological validity but often at the cost of experimental control (Athanasopoulos & Bylund, 2020). It is a challenge to ensure that precise measurements are taken under the same conditions throughout an experimental task. This may be less problematic for offline measures such as untimed categorisation preferences or similarity judgements (Roberson et al., 2000) than for more noise/time-sensitive online measures such as eye-fixations (Huettig & Mishra, 2014). But online measures, as shown with response times (Germine et al., 2012), need not necessarily be noisier in the field than in the lab. Careful timing of data collection (e.g., outside harvest season in farming communities) and adjustments to local conditions in the field (e.g., shorter tests for people not used to experiments) can contribute to valuable methodological triangulations through measures taken outside the lab.

14.3.5.2

Strengths and limitations

Three main motivators to take measures out of the lab include diversification of samples from less well-documented languages and populations, crowdsourcing vast amounts of data through online experiments, and boosting ecological validity through study replications in everyday contexts (Speed et al., 2018). Smaller languages in psycholinguistics and demographic sampling outside urban contexts are particularly informative when we consider that structural crosslinguistic differences can robustly affect general cognitive processes (Majid et al., 2004). Fieldsite logistics may be challenging, but solar powers and portable equipment have enabled researchers to venture far out in the field and record a variety of psycholinguistic measures including ERPs (Norcliffe et al., 2015). Potential disadvantages of experimentation outside the lab vary from those in a typical experiment (Crowley, 2007). No two fieldsites are the same, many lack undisturbed testing spaces, so experimental control may be difficult to achieve. Also, testing populations in remote areas with modern technology can be daunting to participants who have never been tested before. Simple designs and task demos can help overcome initial reluctance to participate.

14.3.5.3 Application across methods and experiments Here we briefly survey two offline measures to provide examples of best practice when testing outside the lab. The Space Game series (summarised in Levinson & Wilkins, 2006) used a set of designs enabling psycholinguistic investigation outside the lab with diverse populations. In one experiment, researchers measured spatial conceptualisation preferences of speakers from 12 different language groups using small simple 3D objects. When participants were rotated 180 degrees and asked to reconstruct object positions they had seen earlier, speakers of languages such as Tzeltal, who typically communicate about small-scale spatial relations using absolute coordinates (e.g., the fork is to the north of the plate), reconstructed object constellations following the ­ absolute frame of reference (i.e., still placing the fork to the north of the plate after being rotated). Conversely, speakers of Dutch, a relative-coordinate language (typically encoding the same scene as the fork is to the left of the plate) tended to use the relative frame of reference also when reconstructing spatial relations. In the second example study, Boroditsky and Ramscar (2002) made use of a rich naturalistic context of a café and measured how preferential choices in thinking about time differ based on where people are located in space. In one of the experiments, researchers asked individuals waiting in line for lunch the question ‘Next Wednesday’s meeting has been 229

Sayaka Sato and Norbert Vanek

moved forward two days. What day is the meeting now that it has been rescheduled?’ The answers ­ ­ ‘Friday’ or ‘Monday’ varied depending on people’s position in the line. Those further along the line, having experienced more motion, tended to adopt an ego-moving perspective and respond ­ ‘Friday’ as if they had also moved forward in time, whereas those towards the end of the line, who experienced less motion forward, tended to adopt a time-moving perspective and respond ­ ‘Monday’ as if time was moving towards them. This study made elegant use of the real-world spatial context, which turned out to be a more reliable predictor of how people think about time than people’s estimates of their waiting time.

14.4

Summary and conclusion

In this chapter, we have touched on key experimental measures that are frequently used in the domain of experimental psycholinguistics to investigate questions pertaining to linguistic relativity. As we have highlighted, online measures provide means to examine the unconscious and nonstrategic cognitive processes as they unfold in real time. In contrast, offline measures may be more suited to addressing research focusing on speakers’ meta-linguistic knowledge or when considering different knowledge bases such as implicit and explicit knowledge. As each has its strengths and weaknesses, we hope that researchers can make use of a complementary range of measures to conduct their future investigations.

Note 1 Although the term response time is interchangeably used with the term reaction time, the former can be conceptualised as the sum of the time taken to react to a stimulus, and the cognitive and motor response required to initiate it. On the other hand, the latter can refer to the reaction to a single stimulus such as simply perceiving light. According to Hick’s law (1952), when the amount of information to-be-processed increases in choice reaction experiments, response times will slow down, reflecting greater uncertainty.

Further reading de Groot, A. M. B., & Hagoort, P. (Eds.). (2018). Research Methods in Psycholinguistics and the Neurobiology of Language: A Practical Guide. Wiley. Gillioz, C., & Zufferey, S. (2020). Introduction to Experimental Linguistics (Cognitive Science Series). John Wiley & Sons. Podesva, R. J., & Sharma, D. (Eds.). (2014). Research Methods in Linguistics. Cambridge University Press.

Related topics New directions in statistical analysis for experimental linguistics; historical perspectives on the use of experimental methods in linguistics; controlling social factors in experimental linguistics; analysing language using brain imaging.

References Allopenna, P. D., Magnuson, J. S., & Tanenhaus, M. K. (1998). Tracking the time course of spoken word recognition using eye movements: Evidence for continuous mapping models. Journal of Memory and Language, 38, ­419–439. ​­ https://doi.org/10.1006/jmla.1997.2558 ­ ­ ­ Altmann, G. T. M., & Kamide, Y. (2007). The real-time mediation of visual attention by language and world knowledge: Linking anticipatory (and other) eye movements to linguistic processing. Journal of Memory and Language, 57, ­502–518. ​­

230

Contrasting online and offline measures ­ ­

­

­ ­ ­

​­

­

­

­

­

​­

­ ​­

­

­

​­

​­

­

­

​­

​­

​­ ­ ­

­

​­ ­

​­

​­ ­

​­ ­ ­

­

­

​­ ­

­

­

­

­ ­

​­

​­ ​­

​­ ­ ­

­

­

​­ ​­ ­

​­

231

​­

​­

Sayaka Sato and Norbert Vanek Gilbert, A. L., Regier, T., Kay, P., & Ivry, R. B. (2006). Whorf hypothesis is supported in the right visual field but ­ 489–494. ­ ​­ ­ ­ not the left. Proceedings of the National Academy of Sciences, 103(2), https://doi.org/10.1073/ ­ pnas.0509868103 ­ Gilquin, G.,  & Gries, S.T. (2009). Corpora and experimental methods: A ­state-of-the-art ­​­­ ­​­­ ​­ review. Corpus Linguistics and Linguistic Theory, 5(1), https://doi.org/10.1515/CLLT.2009.001 ­ ­1–26. ​­ ­ ­ ­ Goldstone, R. L., Kersten, A., & Carvalho, P. F. (2013). Concepts and categorization. In A. F. Healy, R. W. ­­  ­607–630). ​­ Proctor & I. B. Weiner (Eds.), Handbook of Psychology: Experimental Psychology (pp. John Wiley & Sons, Inc. Goller, F., Choi, S., Hong, U., & Ansorge, U. (2020). Whereof one cannot speak: How language and capture of visual attention interact. Cognition, 194, 104023. https://doi.org/10.1016/j.cognition.2019.104023 ­ ­ ­ Harley, T. (2001). ­ The Psychology of Language. From Data to Theory. Psychology Press. Harris, C. L., Ayçíçeğí, A., & Gleason, J. B. (2003). Taboo words and reprimands elicit greater autonomic reactivity in a first language than in a second language. Applied Psycholinguistics, 24, ­561–579. ​­ Harris, C. L., Gleason, J. B., & Ayçíçeğí, A. (2006). When is a first language more emotional? Psychophysiological evidence from bilingual speakers. In A. Pavlenko (Ed.), Bilingual Minds: Emotional Experience, Expression, and Representation, (pp. Multilingual Matters. ­­  ­257–283). ​­ Hauk, O., Johnsrude, I., & Pulvermuller, F. (2004). Somatotopic representation of action words in human motor and premotor cortex. Neuron, 41, 301e307. Hebert, K. P., Goldinger, S. D., & Walenchok, S. C. (2021). Eye movements and the label feedback effect: Speaking modulates visual search via template integrity. Cognition, 210, 104587. Hellige, J. B. (1993). Hemispheric Asymmetry: What’s Right and What’s Left. Harvard University Press. Hick, W. E. (1952). On the rate of gain of information. Quarterly Journal of Experimental Psychology, 4, ­11–26. ​­ Huettig, F., & Mishra, R. K. (2014). How literacy acquisition affects the illiterate mind–a critical examination ​­ of theories and evidence. Language and Linguistics Compass, 8, ­401–427. Hugdahl, K. (2001). Psychophysiology: The ­Mind–Body ​­ Perspective. Harvard University Press. Jacobs, A. M., & Grainger, J. (1994). Models of visual word recognition — sampling the state of the art. Journal of Experimental Psychology: Human Performance and Perception, 20, 1311–1334. ­ ​­ Jankowiak, K., & Korpal, P. (2018). On modality effects in bilingual emotional language processing: Evidence from galvanic skin response. Journal of Psycholinguistic Research, 47, ­663–677. ​­ Ji, Y., & Papafragou, A. (2020). Is there an end in sight? Viewers’ sensitivity to abstract event structure. Cognition, 197, 104197. Jouravlev, O., Taikh, A., & Jared, D. (2018). Effects of lexical ambiguity on perception: A test of the label feedback hypothesis using a visual oddball paradigm. Journal of Experimental Psychology: Human Perception and Performance, 44, ­1842–1855. ​­ Just, M. A., & Carpenter, P. A. (1980). A theory of reading: From eye fixations to comprehension. Psychologi​­ cal Review, 87, ­329–354. Kaiser, E., Podesva, R., & Sharma, D. (2018). Experimental paradigms in psycholinguistics. In R. Podesva & ­­  ­135–168). ​­ D. Sharma (Eds.), Research Methods in Linguistics (pp. Cambridge University Press. Kersten, A. W., Meissner, C. A., Lechuga, J., Schwartz, B. L., Albrechtsen, J. S., & Iglesias, A. (2010). English speakers attend more strongly than Spanish speakers to manner of motion when classifying novel objects and events. Journal of Experimental Psychology: General, 139(4), ­ ­638–653. ​­ Larsen, R. J., Mercer, K. A., Balota, D. A., & Strube, M. J. (2008). Not all negative words slow down lexical decision and naming speed: Importance of word arousal. Emotion, 8, ­445–452. ​­ Lieberman, A. M., Hatrak, M., & Mayberry, R. I. (2013). Learning to look for language: Development of joint attention in young deaf children. Language Learning and Development, 10(1), ­ 19–35. ­ ​­ https://doi. ­ org/ 10.1080/15475441.2012.760381 ­ Levinson, S. C., & Wilkins, D. P. (2006). Grammars of Space: Explorations in Cognitive Diversity. Cambridge University Press. Luck, S. J. (2014). An Introduction to the Event-Related Potential Technique. MIT Press. Lucy, J. A., & Gaskins, S. (2003). Interaction of language type and referent type in the development of nonverbal classification preferences. In D. Genter & S. Goldin-Meadow (Eds.), Language in Mind: Advances in the Study of Language and Thought (pp. ­­  ­465–492). ​­ MIT Press. Lupyan, G., & Thompson-Schill, S. L. (2012). The evocative power of words: Activation of concepts by verbal and nonverbal means. Journal of Experimental Psychology: General, 141, 170–186. ­ ​­

232

Contrasting online and offline measures Lupyan, G., Rakison, D. H., & McClelland, J. L. (2007). Language is not just for talking: Redundant labels facilitate learning of novel categories. Psychological Science, 18, 1077–1083. ­ ​­ Lurito, J. T., & Dzemidzic, M. (2001). Determination of cerebral hemisphere language dominance with functional magnetic resonance imaging. Neuroimaging Clinics of North America, 11(2), ­ ­355–63. ​­ Majid, A., Bowerman, M., Kita, S., Haun, D. B. M., & Levinson, S. C. (2004). Can language restructure cognition? The case for space. Trends in Cognitive Sciences, 8, ­108–114. ​­ Margolis, E., & Laurence, S. (1999). Concepts: Core Readings. Bradford. Masuda, T., Ishii, K., Miwa, K., Rashid, M., Lee, H., & Mahdi, R. (2017). One label or two? Linguistic influences on the similarity judgment of objects between English and Japanese speakers. Frontiers in Psychol­ ­ ­ ogy, 8, 1637. https://doi.org/10.3389/fpsyg.2017.01637 Miller, T., Schmidt, T., Blankenburg, F., & Pulvermüller, F. (2018). Verbal labels facilitate tactile perception. ​­ Cognition, 171, ­172–179. Mo, L., Xu, G., Kay, P., & Tan, L.-H. (2011). Electrophysiological evidence for the left-lateralized effect of language on preattentive categorical perception of color. Proceedings of the National Academy of Sci­ ­14026–14030. ​­ ­ ­ ­ ences, 108(34), https://doi.org/10.1073/pnas.1111860108 Noorman, S., Neville, D. A., & Simanova, I. (2018). Words affect visual perception by activating object shape ­ ­1–10. ​­ representations. Scientific Reports, 8(1), Norcliffe, E., Harris, A. C., & Jaeger, T. F. (2015). Cross-linguistic psycholinguistics and its critical role in theory development: Early beginnings and recent advances. Language, Cognition and Neuroscience, 30, ­1009–1032. ​­ Nosofsky, R. M. (1986). Attention, similarity, and the identification-categorization relationship. Journal of ­ ­39–61. ​­ Experimental Psychology, 115(1), Pan, X., & Jared, D. (2021). Effects of Chinese word structure on object perception in Chinese–English bilinguals: ­ ­111–123. ​­ Evidence from an ERP visual oddball paradigm. Bilingualism: Language and Cognition, 24(1), Phillips, W., & Boroditsky, L. (2003). Can quirks of grammatical gender affect the way you think? Grammatical gender and object concepts. In R. Alterman & D. Kirsch (Eds.), Proceedings of the 25th Annual Conference of the Cognitive Science Society, Boston. Prinz, J. J. (2004). Gut Reactions: A Perceptual Theory of Emotion. New York: Oxford University Press. Pulvermüller, F. (2018). Neural reuse of action perception circuits for language, concepts and communication. ​­ Progress in Neurobiology, 160, ­1–44. Rayner, K. (2009). The 35th Sir Frederick Bartlett lecture: Eye movements and attention in reading, scene ​­ perception, and visual search. Quarterly Journal of Experimental Psychology, 62, ­1457–1506. Roberson, D., Davies, I., & Davidoff, J. (2000). Color categories are not universal: replications and new evi­ 369. dence from a stone-age culture. Journal of Experimental Psychology: General, 129(3), Rosch, E. (1975). Cognitive representations of semantic categories. Journal of Experimental Psychology: General, 104, ­192–232. ​­ Sakarias, M., & Flecken, M. (2019). Keeping the result in sight and mind: General cognitive principles and language-specific influences in the perception and memory of resultative events. Cognitive Science, 43(1), ­ e12708. Sato, S., & Athanasopoulos, P. (2018). Grammatical gender affects gender perception: Evidence for the ​­ ­structural-​­ feedback hypothesis. Cognition, 176, ­220–231. Sato, S., Casaponsa, A., & Athanasopoulos, P. (2020). Flexing gender perception: Brain potentials reveal the cognitive permeability of grammatical information. Cognitive Science, 44, e12884. Sauppe, S., & Flecken, M. (2021). Speaking for seeing: Sentence structure guides visual event apprehension. Cognition, 206, 104516. Schmidt, T. T., Miller, T. M., Blankenburg, F., & Pulvermüller, F. (2019). Neuronal correlates of label facilitated tactile perception. Scientific Reports, 9(1), 1609. Schouten, J. F., & Bekker, J. A. M. (1967). Reaction time and accuracy. Acta Psychologica, 27, 143–153. ­ ​­ Siok, W. T., Kay, P., Wang, W. S., Chan, A. H., Chen, L., Luke, K. -K., & Tan, L. H. (2009). Language regions of brain are operative in color perception. Proceedings of the National Academy of Sciences, 106(20), ­ 8140–8145. ­ ​­ Smits, M., Visch-Brink, E., Schraa-Tam, C. K., Koudstaal, P. J., & van der Lugt, A. (2006). Functional MR Imaging of language processing: An overview of easy-to-implement paradigms for patient care and clinical research. RadioGraphics, 26(suppl_1), ­ ­S145–S158. ​­ https://doi.org/10.1148/rg.26si065507 ­ ­ ­ Speed, L. J., Chen, J., Huettig, F., & Majid, A. (2021). Classifier categories reflect but do not affect conceptual organization. Journal of Experimental Psychology: Learning, Memory, and Cognition, 47, ­625–640. ​­

233

Sayaka Sato and Norbert Vanek Speed, L. J., Wnuk, E., & Majid, A. (2018). Studying psycholinguistics out of the lab. In A. de Groot & P. ­ Hagoort (Eds.), Research Methods in Psycholinguistics and the Neurobiology of Language: A Practical Guide. Wiley. Suffill, E., Branigan, H., & Pickering, M. (2019). Novel labels increase category coherence, but only when ­ people have the goal to coordinate. Cognitive Science, 43(11), e12796. Sutton, S., Braren, M., Zubin, J., & John, E. R. (1965). Evoked-potential correlates of stimulus uncertainty. ­ ­1187–1188. ​­ ­ ­ ­ Science, 150(3700), https://doi.org/10.1126/science.150.3700.1187 Thierry, G. (2016). Neurolinguistic relativity: how language flexes human perception and cognition. Lan​­ ­ ­ ­ guage Learning, 66, ­690–713. https://doi.org/10.1111/lang.12186 Thierry, G., Athanasopoulos, P., Wiggett, A., Dering, B., & Kuipers, J.-R. (2009). Unconscious effects of language-specific terminology on preattentive color perception. Proceedings of the National Academy of ​­ ­ ­ ­ Sciences, 106, ­4567–4570. https://doi.org/10.1073/pnas.0811155106 Thierry, G., & Wu, Y. J. (2007). Brain potentials reveal unconscious translation during foreign-language com­ ​­ prehension. Proceedings of the National Academy of Sciences, 104, 12530–12535. Valdés Kroff, J. R., Guzzardo Tamargo, R. E., & Dussias, P. E. (2018). Experimental contributions of eyetracking to the understanding of comprehension processes while hearing and reading code-switches. LAB, ­ ​­ 8, 98–133. Vanek, N., Sóskuthy, M., & Majid, A. (2021). Consistent verbal labels promote odor category learning. Cognition, 206, 104485. Vanek, N., & Tovalovich, A. (2021). Emotionality ratings and electrodermal responses to university-related expressions in a native and a non-native language. International Journal of Bilingual Education and Bilin​­ ­ ­ gualism, ­1–17. https://doi.org/ 10.1080/13670050.2021.1978924 von Stutterheim, C., Andermann, M., Carroll, M., Flecken, M., & Schmiedtová, B. (2012). How grammaticized concepts shape event conceptualization in language production: Insights from linguistic analysis, ­ https://doi.org/10.1515/ling-2012-0026 ­ ­ ­­ ­​­­ ​­ eye tracking data, and memory performance. Linguistics, 50(4). Whorf, B. L. (1956). Language, Thought and Reality: Selected Writings of Benjamin Lee Whorf. MIT Press. Wickelgren, W. A. (1977). Speed-accuracy tradeoff and information processing dynamics. Acta Psycholog​­ ica, 41, ­67–85. Xia, T., Xu, G., & Mo, L. (2019). Bi-lateralized Whorfian effect in color perception: Evidence from Chinese ​­ Sign Language. Journal of Neurolinguistics, 49, ­189–201. Yu, M., Li, Y., Mo, C., & Mo, L. (2017). Newly learned categories induce pre-attentive categorical perception ­ 14006. https://doi.org/10.1038/s41598-017-14104-6 ­ ­ ­­ ­​­­ ­​­­ ​­ of faces. Scientific Reports, 7(1), ​­ Zacks, J. M. (2020). Event perception and memory. Annual Review of Psychology, 71, ­165–191. Zhang, H., & Vanek, N. (2021). From “No, she does” to “Yes, she does”: Negation processing in negative ­ ­937–967. ​­ yes–no questions by Mandarin speakers of English. Applied Psycholinguistics, 42(4),

234

15 COGNITIVE PROCESSES INVOLVED IN TEXT COMPREHENSION Walking the fine line between passive and strategic validating processes in reading Anne E. Cook and Edward J. O’Brien

15.1

Introduction and definitions

Reading is often described as a single activity when, actually, it is comprised of many component processes. Before an individual can comprehend a text, they must first perceive, recognize, and decode words and access their meanings and then parse them into sensible constructions. Once those constructions are compiled into sentences, paragraphs, and texts, the goal of reading (and the topic of this chapter) is comprehension, or understanding the overall meaning conveyed by a text. In fluent expert readers, many of these component processes involved in comprehension are passive, occurring below the level of conscious awareness. This chapter will review past and present research on passive processes in reading comprehension and attempt to define the general point at which passive processing recedes and strategic processing begins to dominate the comprehension process.

15.2

Historical perspective

Nearly every cognitive psychology model of the processes involved in comprehension can be traced to Kintsch and van Dijk’s (1978) assumption that readers’ long-term memory representations of text have at least two levels – the text base and the situation model. The text base represents the words and relations between the explicitly stated content in the text, whereas the situation model represents the text’s intended meaning. The situation model includes explicitly stated content from the text, but it must also fill in gaps in the text base with information that is reactivated from readers’ general world knowledge. In Kintsch and van Dijk’s (1978) original theory of comprehension, and in nearly all subsequent models of comprehension since then, text is broken down into propositions, or meaning units. Propositions tend to contain content words like verbs and nouns, and their agents or descriptors. Connections are made between propositions based on argument overlap and causality over successive cycles during reading. Due to the constraints of working memory, only a subset 235

DOI: 10.4324/9781003392972-18

Anne E. Cook and Edward J. O’Brien

of propositions is held active in memory from one cycle to the next to be integrated with incoming content. The remaining propositions are connected to the overall discourse representation and encoded in long-term ­ ​­ memory. In an extension of his earlier model, Kintsch (1988) argued that propositions and related information reactivated from long-term memory are connected in an associative network in working memory during a Construction stage. The network may contain information that is semantically related to the text but possibly relevant or irrelevant, and connection strengths between nodes in the network may vary. In a subsequent integration stage, activation spreads throughout the network until it stabilizes, and only the strongest connections between nodes will remain. The output is a coherent representation in which the most active nodes are related and relevant to the current discourse model, and the irrelevant nodes are zeroed out. The reactivation mechanism involved in Kintsch’s (1988) Construction phase is a passive bottom-up process, such as spreading activation (Anderson, 1983) or resonance (Myers & O’Brien, 1998; O’Brien & Myers, 1999). According to the resonance model, the encoding of new content triggers a passive signal to all long-term memory. Any concepts in memory that are related, or share features in common, with the newly encoded information resonate in response. Those concepts that resonate the most are reactivated into working memory. In addition to being passive, resonance is unrestricted in that it can lead to the reactivation of either previously read content from the text or information from general world knowledge. Resonance is also dumb: information can be reactivated regardless of whether it is currently relevant or irrelevant to the current events in the text. Resonance only describes the activation mechanism involved in the Construction phase. Resonance operates independently of whether its product will facilitate or hinder the subsequent integration phase of comprehension. The assumption of a passive two-stage activation + integration model of comprehension has also been incorporated into other models of reading (e.g., Sanford & Garrod, 1998, 2005; van den Broek et al., 1996) and is at the core of the memory-based view of reading comprehension (McKoon & Ratcliff, 1992). Other theorists who argue that comprehension involves more strategic or evaluative processes espouse the explanation-based view (e.g., Graesser et al., 1994; Singer et al., 1994; Zwaan & Radvansky, 1998). The debate between the memory-based and explanationbased views dominated research in discourse comprehension from the 1980s until very recently.

15.3

Critical issues and topics

The fundamental difference between the memory-based and explanation-based views lies in their assumptions about what drives the reactivation of information during reading. As described earlier, the memory-based view assumes a passive, dumb, and unrestricted reactivation mechanism. The explanation-based view, however, assumes that readers “search after meaning” (Singer et al., 1994), implying a more active and strategic force governing reactivation during reading. These contrasting assumptions have driven researchers to ask questions about what content may become available to the reader across a variety of different phenomena.

15.3.1

Inferences

Inferences represent situations in which readers must fill in gaps in the text-based representation to make coherent connections between propositions in a text. Anaphoric inferences require the reader to connect an anaphor to a previously encountered antecedent. Early research in this area demonstrated that antecedent reactivation after encountering an anaphor was best explained by a 236

Cognitive processes involved in text comprehension

passive reactivation mechanism. Antecedents were facilitated when they were elaborated in the text (O’Brien et al., 1990), more recently encountered (O’Brien, 1987), semantically related to the antecedent (O’Brien et al., 1997), and causally connected to the antecedent (O’Brien & Myers, 1987). In fact, readers activated an incorrect antecedent from general world knowledge when it was highly related to the text, even when the actual antecedent was explicitly stated in the text (O’Brien & Albrecht, 1991). This latter finding is inconsistent with an explanation-based view that assumes readers actively search memory for information that will facilitate comprehension. These initial findings on antecedent retrieval served as the foundation for the resonance model (Myers & O’Brien, 1998; O’Brien & Myers, 1999). Other inferences require readers to fill in gaps in the text with information from general world knowledge. Early discussions of inferencing in the memory-based view argued that only information that is readily available to the reader would be activated and that many inferences were not automatically activated during reading (McKoon & Ratcliff, 1992). This contrasted with the explanation-based assumption that readers actively generate a wide variety of inferences (Graesser et al., 1994). However, if a passive memory retrieval mechanism like resonance is responsible for making information “readily available”, inference activation would be subject to the same retrieval process as any other information from memory. Consistent with this perspective, researchers outlined the factors that governed (and limited) activation of a wide range of inference types: elaborative, predictive, instrumental, and logical inferences (for a review, see Cook & O’Brien, 2015), which is ultimately consistent with the view initially outlined by McKoon and Ratcliff (1992). In general, inferences are activated when they are strongly supported by contextual information. The specificity of what is activated also depends on contextual support: activated information may range from a general idea to a specific lexical item (Cook et al., 2001; Gillioz & Gygax, 2017; Lassonde & O’Brien, 2009).

15.3.2

Coherence

Within the assumptions of the Kintsch and van Dijk (1978) view of comprehension, coherence is the extent to which incoming information can be easily integrated with preceding information. Local coherence involves integrating incoming information with the contents of active memory, and global coherence involves integrating incoming information with content that is no longer active in memory but stored in long-term memory. One view is that if incoming content is locally coherent with the contents of active memory, no search for related information in long-term memory is necessary. However, if readers are sensitive to global inconsistencies even when local coherence has been maintained, this would suggest that a passive reactivation mechanism is responsible for establishing global coherence. Across several studies, researchers have demonstrated that readers are sensitive to violations of global coherence when there is no local coherence break (e.g., Albrecht & O’Brien, 1993; O’Brien & Albrecht, 1992; see review in Cook & O’Brien, 2019; O’Brien & Cook, 2015). After establishing that the activation mechanism involved in maintaining coherence was indeed passive, researchers turned to testing the limits of this passive process. For example, is passive activation limited to information that is relevant to the current discourse model, or does related but irrelevant information also become reactivated? There is no assumption in the resonance model that would allow for the determination of relevance. Thus, this model predicts that any related information would be reactivated, regardless of relevance. This was confirmed across several studies in which passages contained globally inconsistent information that was semantically related to a target sentence but varied in its relevance (Cook et al., 1998; O’Brien et al., 1998; O’Brien 237

Anne E. Cook and Edward J. O’Brien

et al., 2004; O’Brien et al., 2010). For example, a passage described a protagonist with a character trait that once was true (e.g., vegetarianism) but was qualified as no longer being true (e.g., not a vegetarian anymore). Then later in the passage, a target sentence indicated that the same protagonist went to a restaurant and ordered a cheeseburger and fries. Thus, the qualified characteristic is related to the information in the target sentence, but it is no longer relevant. Readers still reactivated irrelevant information, leading to a breakdown in global coherence as observed in processing difficulty on the target sentence. Factors that mediate the reactivation of irrelevant content include distance (Gueraud et al., 2018), elaboration (Gueraud et al., 2005), and causality (Kendeou et al., 2013). Finally, researchers turned to the assumption that reactivation from memory is unrestricted, in that either previously encountered information from the text or content from the readers’ general world knowledge can be reactivated during reading. Within a passive reactivation model like resonance, it is assumed that the information with the strongest signal, or most featural overlap to incoming content, will be reactivated fastest. However, several studies have demonstrated that it is possible for information from these two sources to compete for initial influence on comprehension (Albrecht & O’Brien, 1991; Cook, 2014; Cook & Gueraud, 2005; Cook & Myers, 2004; Garrod & Terras, 2000; Rizzella & O’Brien, 2002). It is this competition between contextual information and general world knowledge for influence on comprehension that sparked the next wave of theoretical questions about comprehension.

15.4

Current contributions and research

Although previous studies had established the passive, dumb, and unrestricted nature of the reactivation mechanism that underlies reading, it is also necessary to describe the processes by which reactivated information is connected to and checked against the contents of active memory. Twostage models of comprehension, such as Kintsch’s (1988) Construction-Integration (CI) model, assume that first readers construct a network of propositions based on recently encoded information as well as content reactivated from memory. In the second integration stage, activation in the network stabilizes such that concepts with low activation levels or that are irrelevant to the text are eliminated from the network. Cook and O’Brien (2014; O’Brien & Cook, 2016a, 2016b) expanded and refined these assumptions in their three-stage RI-Val model and indicated how they play out over time. In Figure 15.1, the first R stage is the reactivation process (solid curve), in which a resonance-like mechanism reactivates information from memory that is related to incoming content. As concept activation levels increase, their potential for influence on subsequent stages of comprehension also increases. The second I stage (dotted curve) integrates or links reactivated information that meets a minimum threshold for activation (gray dashed horizontal line) with the incoming content based on the conceptual overlap, or goodness of fit. The third Val, or validation stage (dashed curve), then evaluates those linkages from the previous stage against the contents of active memory. Validation is assumed to operate comparably to a conceptual pattern-matching process (e.g., Kamas & Reder, 1995; Kamas et al., 1996; Reder & Cleeremans, 1990), wherein linkages that have a high degree of match with active memory will be easier to validate and process than linkages with a low degree of match. The validation process does not require perfect matches for comprehension to proceed, however. The RI-Val model thus preserves the assumptions of the CI model but provides a validation mechanism to explain how information may (or may not) influence comprehension. All stages are assumed to be passive. They occur automatically and run to completion over time. The three 238

Cognitive processes involved in text comprehension

­Figure  15.1

The RI-Val Model of Comprehension. This figure represents the Resonance, Integration and Validation stages assumed by the model. Adapted and reprinted from “Coherence threshold and the continuity of processing: The RI-Val model of comprehension,” by E. J. O’Brien & A. E. ​­ Cook, 2016, Discourse Processes, 53, ­326–338.

stages are also assumed to be asynchronous: the output of one stage serves as the input for the subsequent stage. And as soon as all three stages have initiated, they run in parallel. The third, and most critical assumption of the RI-Val model regards the extent to which the three stages run before the reader moves on to the subsequent text. We have called this shift to new information in the text the coherence threshold (dashed vertical line in Figure 1), and it is defined as the point when validation has yielded sufficient matches for comprehension to proceed. Note that although the coherence threshold marks the point in time in which the reader’s attention shifts to new incoming content, processes that are already in progress continue. That is, even after the reader has passed the coherence threshold and moved on to new content, information related to the just-processed text may still be reactivated, integrated, and validated. Importantly, this explains why processing effects may be observed either before and/or after the reader passes the coherence threshold. This assumption that validation continues even after the reader has moved on to subsequent text is what separates the RI-Val model’s view of validation from other conceptualizations of validation (e.g., Richter, 2015; Singer, 2013).

15.4.1

Context versus general world knowledge

The first set of tests of the RI-Val model’s assumptions focused on the competition between contextual information and general world knowledge for influence on comprehension. Balancing the two types of information is especially important in fantasy text comprehension, in which the passage context regularly provides information that conflicts with general world knowledge. Creer et al. (2018) manipulated whether a text was introduced with information about a fantasy world or real-world narrative content, and whether the text focused on a well-known fantasy character (e.g., Superman) or an unknown individual. A subsequent target sentence violated real-world general knowledge (e.g., bullets bounced off his chest). Creer et al. found that validation of the target sentence was easier when reading about well-known fantasy characters than unknown characters, and they argued that it was because readers activated their own general world knowledge about the 239

Anne E. Cook and Edward J. O’Brien

­

240

Cognitive processes involved in text comprehension

term (e.g., Noah). According to the RI-Val model, in such instances, validation may yield a partial match between encoded content and reactivated information that is sufficient for comprehension to proceed. Most tests of semantic anomalies have involved methodologies that require a conscious anomaly detection and do not allow for measuring delayed processing difficulty. However, the RI-Val model assumes that processing continues even after the reader moves on in the text, or in this case makes a response decision, so processing may be observed after a delay. Consistent with this view, Cook et al. (2018) used eye-tracking measures to demonstrate that processing difficulty due to high-related anomalous targets (e.g., Moses) occurred only after a delay, meaning that slowdowns in reading due to high-related anomalies were observed only in second pass rereading times on the target. This was true whether readers explicitly (and consciously) detected the anomalies or not. Readers only experienced immediate processing difficulty when anomalies were low-related to the correct targets (e.g., Nixon). Cook et al.’s findings demonstrate that semantic anomalies may not always register a conscious detection, but they may still impact processing over time. Williams et al. (2018) developed a method for measuring both immediate and delayed processing difficulty due to semantic anomalies without requiring explicit detection responses. They embedded semantic anomalies in target sentences within narrative passages that contained either high- or low-contextual support for the underlying relation between the anomalous and correct content (Moses and Noah, respectively). Because the dependent measure was self-paced lineby-line reading time (instead of explicit detection), they were able to measure both immediate processing difficulty due to the anomaly on the target sentence as well as delayed processing difficulty on a subsequent spillover sentence. Williams et al. found that when the passage contained high-contextual support for the anomaly (i.e., it mentioned several features shared by the anomalous and correct words, such as Bible, Old Testament, patriarch), processing difficulty due to the anomaly was delayed until the spillover sentence. That is, when contextual information was strong, it was reactivated, integrated, and validated more quickly than information about the correct target that had to be reactivated from general world knowledge. Because that contextual information supported the match between the anomaly and the rest of the statement, there was no immediate processing difficulty. However, RI-Val processes are assumed to continue operating even after the reader moves on in the text. Additional information continued to be reactivated from general world knowledge and integrated and validated with the contents of active memory. This additional information likely contained distinguishing content that would make the mismatch between the anomaly (Moses) and the rest of the statement (took animals on the Ark) more apparent. Thus, readers did experience processing difficulty due to the anomaly but only after a delay (see also Cook et al., 2018). In contrast, when the passage contained low-contextual support for the anomaly, readers experienced immediate difficulty on the target sentence. In this case, most of the information reactivated from memory came from general world knowledge and highlighted the mismatch between the anomalous target word and the rest of the statement. The results of Williams et al.’s (2018) study highlight the competition between contextual information and general world knowledge for influence on comprehension over time.

15.4.2

Coherence threshold

The second set of tests of the RI-Val model focused on the coherence threshold assumption. As stated previously, this threshold is assumed to be the point in time at which validation has yielded a sufficient match for comprehension to proceed and for the reader to move on to new information. However, this threshold is not fixed – it can be raised or lowered such that the degree of match yielded by the validation process is high or low. When the coherence threshold is raised, there 241

Anne E. Cook and Edward J. O’Brien

is more time and thus more potential for information to be reactivated, integrated, and validated before the reader moves on in the text. And when the coherence threshold is lowered, there is less time and potential for information to go through the three stages before readers move on to subsequent text. Williams et al. (2018) manipulated the coherence threshold to change the extent to which validation would complete before the reader moved on in the text. Importantly, this was done without changing the nature of the reading task or giving the reader specific instructions. Williams et al. simply changed the number of comprehension questions readers received at the end of each passage. Recall that under normal coherence threshold conditions (i.e., one question per passage), the high-context condition was associated with delayed processing difficulty (i.e., on the spillover sentence), but the low-context condition was associated with immediate processing difficulty (i.e., on the target sentence). When the coherence threshold was lowered (i.e., by reducing the number of comprehension questions readers received), processing difficulty due to a semantic anomaly in the target sentence was no longer observed at all in the high-context condition and was only observed after a delay in the low-context condition. In a subsequent experiment, Williams et al. raised the coherence threshold by increasing the number of comprehension questions per passage. In this case, validation had more time to complete before readers moved on from the target sentence, and immediate processing difficulty was observed on the target in both the high- and ­ ​­ low-context conditions. Using a more subtle type of anomaly, Dutton and Cook (2022, see also Cook, 2014; Cook & Wei, 2019) used passages in which target sentences contained an anaphor that was either a correct reference for an antecedent mentioned earlier in the passage, incorrect but high-related, or incorrect and low-related. They used eye tracking to obtain a precise measure of when the processing difficulty due to the anomalous anaphor first occurred. Consistent with Cook and Wei (2019), they found that processing difficulty was not observed until readers had already moved past the anaphor in the text. In a subsequent experiment, raising the coherence threshold with additional comprehension questions did not change the timing of initial processing difficulty on the anaphor, but it did result in readers making more regressions (or more leftward eye movements) back to the anaphor from the adjacent spillover region (i.e., the text immediately following the anaphor, used to assess any effects of processing the anaphor that continue after the eye has moved past the anaphor in the text). These findings are consistent with Williams et al.’s (2018) contention that raising the coherence threshold allows more time for validation to complete, and for potential mismatches to arise, potentially leading to processing difficulty before the reader moves on to new information. More important with respect to the RI-Val model’s assumptions, however, is Dutton and Cook’s (2022) finding that raising the coherence threshold did not result in increased regressions from the line containing the incorrect anaphor back to the region of the text containing the original antecedent. Although the RI-Val view assumes that validation is driven by information reactivated from long-term ­ ​­ memory, the studies that have tested it have relied primarily on ­self-paced ​­ ­line-by-line ­​­­ ​­ reading, which forces readers to rely on reactivated content because the paradigm does not allow for regressions to previously encountered lines of text. Dutton and Cook’s finding that readers did not regress to previous lines of text, even when the whole text was accessible on the screen and the coherence threshold was high, is consistent with the view that validation operates on information that is currently active in memory. Shifting the coherence threshold may heighten readers’ attention to details in the text, but it does not change general reading behaviors (Creer et al., 2019). The studies just described demonstrate that the reader’s coherence threshold can be manipulated without changing the overall nature of the reading task or providing explicit instructions to 242

Cognitive processes involved in text comprehension

the reader. Indeed, O’Brien and Cook (2016a, 2016b) argued that the coherence threshold is likely to be subject to the influence of task, text, or reader variables. Sonia and O’Brien (2021) provided a demonstration of a text-based manipulation of the coherence threshold. They inserted an inconsistency early in the passage and argued that the early disruption in comprehension could signal readers to start processing the text more carefully (i.e., raise the coherence threshold). This is consistent with suggestions by previous researchers that disruptions in reading can lead to more attentive processing on the part of the reader (Kamas et al., 1996; McNamara & Kintsch, 1996; O’Brien et al., 1998, 2010; O’Brien & Myers, 1985). Consistent with this argument, Sonia and O’Brien found that when passage introductions contained an inconsistency, subsequent target sentences with semantic anomalies resulted in immediate processing difficulty. When passage introductions did not contain inconsistencies, though, the processing difficulty on the subsequent target sentence was delayed until the spillover sentence. This pattern was observed regardless of whether (1) the initial inconsistency was contextually related to the semantic anomaly, (2) the distance between the initial inconsistency and the target sentence was increased, or (3) participants were required to pause reading for 2,000 ms between the initial inconsistency and the subsequent target sentence. The coherence threshold was only “reset” to baseline when there was a full passage break between the initial inconsistency and the subsequent target sentence.

15.4.3 

Strategic processes: A memory-based ­ ​­ perspective

Memory-based theorists have sought to explain as much of comprehension as possible with passive processing but always assumed that at some point strategic processing would initiate (O’Brien & Cook, 2015). This leads to the assumption that passive processing precedes strategic processing, and it implies that the output of passive processing serves as the input for subsequent strategic processes. This begs the question, however, of what leads to the initiation of strategic processes? Previous memory-based researchers speculated that unresolved disruptions in comprehension may shift attention enough to “reboot” the resonance signal to long-term memory (see Gerrig & O’Brien, 2005; Myers & O’Brien, 1998), whereas explanation-based researchers assumed that comprehension is driven by readers’ “search after meaning” (e.g., Singer et al., 1994). In terms of the RI-Val model (Cook & O’Brien, 2014; O’Brien & Cook, 2016a, 2016b), passive processes continue to run beyond the coherence threshold – the point at which validation has yielded a sufficient match for the reader to move on in the text. Within RI-Val, one possibility is that strategic processes may initiate when validation runs to completion without yielding a sufficient match. In Figure 15.1, the rightmost dotted line represents the divergence threshold – the point at which passive processes have tapered and strategic processes have just begun. Note, however, that this point allows for a period of time when both passive and strategic processes are operating, as the shift from passive to strategic processes unfolds. As Kendeou and O’Brien (2018) noted, during this transition the fluctuation from passive to strategic process is fluid and which type of process is dominating cannot be meaningfully distinguished. The main priority of the RIVal model is, and always has been, to provide a clear and testable explanation of the assumptions underlying the passive processes involved in comprehension. With the addition of the divergence threshold, we are only marking the point at which passive and strategic processes begin to diverge within RI-Val. We are not attempting to add an explanation of strategic processing to the model itself. This is because the addition of strategic processes to a model of comprehension would require a clear understanding of the cognitive mechanisms involved in strategic processing (see Kendeou & O’Brien, 2018, for a detailed discussion of the difficulty in developing a model that can potentially account for strategic processing). Further, we agree with Kendeou and O’Brien’s 243

Anne E. Cook and Edward J. O’Brien

(2018) argument that avoiding research that seeks to further define the exact point of this line makes good sense: examining processes that operate close to that elusive line simply open up unhelpful arguments about on which side of the line the specific process is operating. Psychologists of reading have been making the argument for over 25 years that more systematic research needs to be conducted on the role of strategies and goals in comprehension (e.g., Long & Lea, 2005, Lorch & van den Broek, 1997; Kendeou & O’Brien, 2018; van den Broek et al., 2005). But to date, the literature on reading strategies still focuses more on the product of comprehension than on process, and we have gained little insight into the nature of the executive control systems involved in strategic processing. It remains a weakness of current models of reading comprehension and a goal for future models.

15.5

Main research methods

As alluded to in the previous section, the research in reading comprehension can be divided into work that is focused on product versus work focused on process. Most of the work in psychology of reading, and nearly all of the studies cited in this chapter, have focused on process. The measures used to study reading processes are predominantly online measures, or in other terms, those that record processing while it is happening. In contrast, offline measures occur after reading has ended and reflect the representations that result from reading. Within online processes, researchers tend to use both activation and reading measures. Activation measures provide information about how active information is in memory at the time of test. The most common activation measures in reading research are probe methodologies, in which immediately after reading (or typically within 500 msec), readers are presented with a probe to which they must respond as quickly as possible. Readers may be asked to say the word aloud (naming probe), indicate whether it appeared in the text they just read (recognition probe), or indicate whether the probe is a word (lexical decision probe). Across all three of these probe types, the speed of response is associated with higher or lower levels of activation in memory. However, it should be noted that the binary response required for recognition and lexical decision probes adds a decision component to the response. In most cases, a probe that reflects a highly active concept will be associated with a correct and fast “yes” response. If a probe reflects a concept that is highly related to the contents of the active memory (which can come from either readers’ mental representation of the text or from general world knowledge) but did not actually appear in the text, though, readers may be conflicted between an incorrect “yes” response and the correct “no” answer. This conflict could produce a slowed response relative to a control condition. Online reading measures typically allow readers to advance self-paced through a text, one isolated portion of text at a time. Although researchers have employed word-by-word and phraseby-phrase self-paced paradigms (or moving window paradigms), presenting text in such small segments may encourage readers to engage in unnatural reading strategies (Danks, 1986). Self-paced line-by-line reading paradigms are much more common. This method provides a naturalistic approach to reading, in which readers advance through a text one line at a time, with reading time for an individual line as the dependent measure. Reading time for any given line can be interpreted as the ease with which that information is validated against the contents of active memory. This does not mean that readers are conscious of the ease or difficulty of this process (see Singer et al., 2023). With respect to the RI-Val model, the line-by-line self-paced method also gives a clear indication of when readers have reached the coherence threshold because they must press a key to indicate they are ready to move on in the text. It is important to point out that reading time is a measure of ease of processing and not a measure of activation. Reading time may be facilitated relative to a 244

Cognitive processes involved in text comprehension

control condition when related information is activated in memory and has a strong match with incoming content. Conversely, reading time may be slowed when conflicting information is active in memory, as in the case of the global coherence studies discussed earlier (e.g., Albrecht & O’Brien, 1993). Lack of a slowdown, however, does not mean that the conflicting information is not active in memory. Reading time is not sensitive enough to reflect activation levels (see Kendeou et al., 2013), especially if activation is still building. In those cases, processing difficulty may only show up after a delay (e.g., Walsh et al., 2018; Williams et al., 2018). If researchers are interested in measuring the time course of processing, and in particular delayed processing, it is important to design experimental materials to accommodate measurement of spillover regions/sentences. Researchers who wish to obtain a more fine-grained analysis of processing than that allowed by a line-by-line technique may use eye tracking to measure readers’ forward and regressive fixations and saccades while processing a text. Eye-tracking measures of reading provide insight into the time course of processing individual words and phrases without employing the unnaturalistic ­ ­​­­ ​­ ­​­­ ​­ ​­ word-by-word or ­phrase-by-phrase presentation methods mentioned earlier. ­Eye-tracking measures can be divided into measures of early reading processes (single fixation duration, first fixation duration, first pass duration), later reading processes (go past duration, second pass duration), and general processing difficulty (number of fixations, total time) (for reviews, see Cook & Wei, 2017; Rayner, 1998). Although studies of reading that use eye tracking tend to present the whole text on a screen rather than line by line, Cook and Wei (2019) found that eye-tracking results are largely consistent with the results of line-by-line reading methods, because readers rarely regress to reread previous lines or portions of text. Offline measures of comprehension focus on readers’ understanding of information in the text. For example, readers may be asked to verify statements about what they have read or answer simple questions about the text. Other methods involve asking readers to summarize or recall what they have read. As stated earlier, these offline methods are more likely to yield information about the readers’ long-term memory representation of the text than about processes that occur during comprehension. One measure that is a hybrid between online and offline is talk-aloud protocols that are designed to provide insight into and convergent evidence for online processes (Suh & Trabasso, 1993).

15.6

Recommendations for practice

Our recommendations apply to researchers who are interested in studying comprehension processes. When designing experiments, it is important to isolate the individual processes being tested and to select measures that correspond to those processes as well as the scale of analysis required. We believe the best studies use measures that provide insight into processing over time (e.g., lineby-line reading with multiple adjacent target lines or eye tracking) and push the limits of assumptions within current models.

15.7

Future directions

There has been considerable work done on the passive processes involved in comprehension. The most significant advances to the field are likely to come from studies that further explore the divergence between passive and strategic processes, and more specifically, provide clear and testable assumptions for the executive control processes involved in comprehension strategies. As Kendeou and O’Brien (2018) argued, it will be necessary for researchers to develop new methodological and theoretical approaches to investigate strategic processing. For example, they suggested 245

Anne E. Cook and Edward J. O’Brien

a cluster approach, in which future studies examine robust clusters of text, task, and reader variables that tend to co-occur, and then researchers should apply traditional experimental paradigms to investigate the role of strategic processes in these clusters in systematic ways. However, as Kendeou and O’Brien implied, the process of narrowing down these clusters is cumbersome at best and untenable in practice.

Note 1 The sentence that follows the target sentence is frequently called a “spillover” sentence because it is assumed to measure processing that spills over, or continues, from the previous sentence.

Further readings Kintsch, W. (1988). The role of knowledge in discourse comprehension: A construction-integration model. Psychological Review, 95, 163–182 ­ ​­ O’Brien, E. J., & Cook, A. E. (2016b). Separating the activation, integration, and validation components of reading. Psychology of Learning and Motivation, 65, 249–276. ­ ​­

Related topics New directions in statistical analysis for experimental linguistics; historical perspectives on the use of experimental methods in linguistics; testing in the lab and testing through the Web; contrasting online and offline measures: examples from experimental research on linguistic relativity; controlling social factors in experimental linguistics; analyzing reading with eye tracking

References Albrecht, J. E., & O’Brien, E. J. (1991). Effects of centrality on retrieval of text-based concepts. Journal of ­ ​­ Experimental Psychology: Learning, Memory, and Cognition, 17, 932–939. Albrecht, J. E., & O’Brien, E. J. (1993). Updating a mental model. Journal of Experimental Psychology: ­ ​­ Learning, Memory, & Cognition, 19, 1061–1070. Anderson, J. R. (1983). A spreading activation theory of memory. Journal of Verbal Learning and Verbal ­ ​­ Behavior, 22, 261–295. ­ ​­ Cook, A. E. (2014). Processing anomalous anaphors. Memory & Cognition, 42, 1171–1185. Cook, A. E., & Guéraud, S. (2005). What have we been missing? The role of general world knowledge in ­ ​­ discourse processing. Discourse Processes, 39, 365–378. Cook, A. E., Halleran, J. G., & O’Brien, E. J. (1998). What is readily available during reading? A memory­ ​­ based text processing view. Discourse Processes, 26, 109–129. Cook, A. E., Limber, J. E., & O’Brien, E. J. (2001). Situation-based context and the availability of predictive ­ ​­ inferences. Journal of Memory and Language, 44, 220–234. Cook, A. E., & Myers, J. L. (2004). Processing discourse roles in scripted narratives: The influences of con­ ​­ text and world knowledge. Journal of Memory and Language, 50, 268–288. Cook, A. E., & O’Brien, E. J. (2014). Knowledge activation, integration, and validation during narrative text ­ ​­ comprehension. Discourse Processes, 51, 26–49. Cook, A. E., & O’Brien, E. J. (2015). Passive activation and instantiation of inferences during reading. In E. J. O’Brien, A. E. Cook & R. F. Lorch, Jr. (Eds.), Inferences During Reading (pp. ­­  ­19–41). ​­ Cambridge University Press. Cook, A. E., & O’Brien, E. J. (2019). Fundamental components of reading comprehension. In K. Rawson & J. Dunlosky (Eds.), ­ Cambridge Handbook of Cognition and Education. Cambridge University Press. Cook, A. E., Walsh, E., Bills, M. A. A., Kircher, J. C., & O’Brien, E. J. (2018). Validation of semantic illusions independent of anomaly detection: Evidence from eye movements. Quarterly Journal of Experimental Psychology, 7, ­113–121 ​­

246

Cognitive processes involved in text comprehension Cook, A. E., & Wei, W. (2017). Using eye movements to study reading processes: Methodological considerations. In C. A. Was, F. J. Sansoit & B. J. Morris (Eds.), Eye Tracking Technology Applications in Educational Research. (pp. ­­  ­27–47). ​­ IGI Global. Cook, A. E., & Wei, W. (2019). What can eye movements tell us about higher level comprehension? Vision, ​­ 3, ­45–61. Creer, S. D., Cook, A. E., & O’Brien, E. J. (2018). Competing concept activation during comprehension of ​­ fantasy texts. Scientific Studies of Reading, 22, ­308–320. Creer, S. D., Cook, A. E., & O’Brien, E. J. (2019). Taking the perspective of the narrator. Quarterly Journal ​­ of Experimental Psychology, 72, ­1055–1067 Danks, J. H. (1986). Identifying component processes in text comprehension: Comment on Haberlandt and ­ ​­ Graesser. Journal of Experimental Psychology: General, 115, 193–197. Dutton, S. O., & Cook, A. E. (2022). Influence of coherence threshold on rereading behaviors: Evidence from eye movements. Manuscript in preparation. Erickson, T. D., & Mattson, M. E. (1981). From words to meaning: A semantic illusion. Journal of Verbal Learning and Verbal Behavior, 20, ­540–551. ​­ Garrod, S., & Terras, M. (2000). The contribution of lexical and situational knowledge to resolving discourse ​­ roles: Bonding and resolution. Journal of memory and language, 42, ­526–544. Gerrig, R. J. (2018). Experiencing Narrative Worlds: On the Psychological Activities of Reading. Routledge. Gerrig, R. J., & O’Brien, E. J. (2005). The scope of memory-based processing. Discourse Processes, 39, 225–242. ­ ​­ Gerrig, R. J., & Prentice, D. A. (1991). The representation of fictional information. Psychological Science, ­ ​­ 2, 336–340. Gillioz, C., & Gygax, P. M. (2017). Specificity of emotion inferences as a function of emotion contextual support. Discourse Processes, 54, 1–18. ­ ​­ Graesser, A. C., Singer, M., & Trabasso, T. (1994). Constructing inferences during narrative text comprehen­ ​­ sion. Psychological Review, 101, 371–395. Guéraud, S., Harmon, M. E., & Peracchi, K. A. (2005). Updating situation models: The memory-based contribution. Discourse Processes, 39, 243–263. ­ ​­ Gueraud, S., Walsh, E. K., Cook, A. E., & O’Brien, E. J. (2018). Validating information during reading: The ​­ effect of recency. Journal of Research in Reading, 41, ­85–101. Kamas, E. N., & Reder, L. M. (1995). The role of familiarity in cognitive processing. In R. F. Lorch & E. J. O’Brien (Eds.), ­ Sources of Coherence in Reading (pp. Erlbaum. ­­  ­177–202). ​­ Kamas, E. N., Reder, I. M., & Ayers, M. S. (1996). Partial matching in the Moses illusion: Response bias not sensitivity. Memory & Cognition, 24, ­687–699. ​­ Kendeou, P., & O’Brien, E. J., (2018). Reading comprehension theories: A view from the top down. In M. F. Schober, D. N. Rapp & M. A. Britt (Eds.), The Routledge Handbook of Discourse Processes, 2nd edition (pp. Routledge. ­­  ­7–21). ​­ Kendeou, P., Smith, E. R., & O’Brien, E. J. (2013). Updating during reading comprehension: Why causality matters. Journal of Experimental Psychology: Learning, Memory, and Cognition, 39, ­854–865. ​­ Kintsch, W. (1988). The role of knowledge in discourse comprehension: A construction-integration model. Psychological Review, 95, ­163–182. ​­ Kintsch, W. (1998). ­ Comprehension: A Paradigm for Cognition. Cambridge University Press. Kintsch, W., & Van Dijk, T. A. (1978). Toward a model of text comprehension and production. Psychological Review, 85, ­363–394. ​­ Lassonde, K. A., & O’Brien, E. J. (2009). Contextual specificity in the activation of predictive inferences. Discourse Processes, 46, ­426–438. ​­ Long, D. L., & Lea, R. B. (2005). Have we been searching for meaning in all the wrong places? Defining the” search after meaning” principle in comprehension. Discourse Processes, 39, ­279–298. ​­ Lorch, R. F., & van den Broek, P. (1997). Understanding reading comprehension: Current and future contributions of cognitive science. Contemporary Educational Psychology, 22, ­213–246. ​­ McKoon, G., & Ratcliff, R. (1992). Inference during reading. Psychological Review, 99, ­440–466. ​­ McNamara, D. S., & Kintsch, W. (1996). Learning from texts: Effects of prior knowledge and text coherence. Discourse Processes, 22, ­247–288. ​­ Myers, J. L., & O’Brien, E. J. (1998). Accessing the discourse representation during reading. Discourse Processes, 26, ­131–157. ​­

247

Anne E. Cook and Edward J. O’Brien O’Brien, E. J. (1987). Antecedent search processes and the structure of text. Journal of Experimental Psychology: Learning, Memory, and Cognition, 13, ­278–290. ​­ O’Brien, E. J., & Albrecht, J. E. (1991). The role of context in accessing antecedents in text. Journal of Experimental Psychology: Learning, Memory, and Cognition, 17, ­94–102. ​­ O’Brien, E. J., & Albrecht, J. E. (1992). Comprehension strategies in the development of a mental model. Journal of Experimental Psychology: Learning, Memory, and Cognition, 18, ­777–784. ​­ O’Brien, E. J., & Cook, A. E. (2015). Models of discourse comprehension. In A. Pollatsek & R. Treiman (Eds.). Handbook on Reading (pp. Oxford University Press. ­ ­­  ­217–231). ​­ O’Brien, E. J., & Cook, A. E. (2016a). Coherence threshold and the continuity of processing: The RI-Val model of comprehension. Discourse Processes, 53, ­326–338. ​­ O’Brien, E. J., & Cook, A. E. (2016b). Separating the activation, integration, and validation components of reading. Psychology of Learning and Motivation, 65, ­249–276. ​­ O’Brien, E. J., Cook, A. E., & Guéraud, S. (2010). Accessibility of outdated information. Journal of Experimental Psychology: Learning, Memory, and Cognition, 36, ­979–991 ​­ O’Brien, E. J., Cook, A. E., & Peracchi, K. A. (2004). Updating situation models: Reply to Zwaan and Madden (2004). Journal Experimental Psychology: Learning, Memory, and Cognition, 30, ­289–291 ­ ​­ O’Brien, E. J., & Myers, J. L. (1985). When comprehension difficulty improves memory for text. Journal of Experimental Psychology: Learning, Memory, and Cognition, 11, ­12–21. ​­ O’Brien, E. J., & Myers, J. L. (1987). The role of causal connections in the retrieval of text. Memory & Cognition, 15, ­419–427. ​­ O’Brien, E. J., & Myers, J. L. (1999). Text comprehension: A view from the bottom up. In S. R. Goldman, A. C. Graesser & P. van den Broek (Eds.), Narrative Comprehension, Causality, and Coherence: Essays in Honor of Tom Trabasso (pp. ­­  ­35–53). ​­ Lawrence Erlbaum Associates. O’Brien, E. J., Plewes, P., & Albrecht, J. E. (1990). Antecedent retrieval processes. Journal of Experimental ­ ​­ Psychology: Learning, Memory, and Cognition, 16, 241–249. O’Brien, E. J., Raney, G. E., Albrecht, J. E., & Rayner, K. (1997). Processes involved in the resolution of ­ ​­ explicit anaphors. Discourse Processes, 23, 1–24. O’Brien, E. J., Rizzella, M. L., Albrecht, J. E., & Halleran, J. G. (1998). Updating a situation model: A ­ ​­ memory-based text processing view. Journal of Experimental Psychology: Learning, Memory, and Cognition, 24,1200–1210. ­ ​­ Prentice, D. A., & Gerrig, R. J. (1999). Exploring the boundary between fiction and reality. In S. Chaiken & Y. Trope (Eds.), ­ Dual-Process Theories in Social Psychology (pp. The Guilford Press. ­­  ­529–546). ​­ Rayner, K. (1998). Eye movements in reading and information processing: 20 years of research. Psychological Bulletin, 124, ­372–422. ​­ Reder, L. M., & Cleeremans, A. (1990). The role of partial matches in comprehension: The Moses Illusion revisited. In A. C. Graesser & G. H. Bower (Eds.), The Psychology of Learning and Motivation (Vol. 25, Academic Press, Inc. ­pp. ­233–258). ​­ Richter, T. (2015). Validation and comprehension of text information: Two sides of the same coin. Discourse Processes, 52, ­337–354. ​­ Rizzella, M. L., & O’Brien, E. J. (2002). Retrieval of concepts in script-based texts and narratives: The influence of general world knowledge. Journal of Experimental Psychology: Learning, Memory, and Cognition, 28, ­780–790. ​­ Sanford, A. J., & Garrod, S. C. (1998). The role of scenario mapping in text comprehension. Discourse Processes, 26, ­159–190. ​­ Sanford, A. J., & Garrod, S. C. (2005). Memory-based approaches and beyond. Discourse Processes, 39, ­205–224. ​­ Singer, M. (2013). Validation in reading comprehension. Current Directions in Psychological Science, 22, ­361–366. ​­ Singer, M., Graesser, A. C., & Trabasso, T. (1994). Minimal or global inference during reading. Journal of Memory and Language, 33, ­421–441. ​­ Singer, M., Spear, J., & Rodrigo-Tamarit, M. (2023). Text validation: Overlooking consistency effect discrepancies. Memory & Cognition, 51, ­437–454. ​­ Sonia, A. N., & O’Brien, E. J. (2021). Text-based manipulation of the coherence threshold. Discourse Processes, 58, ­549–568. ​­ Suh, S. Y., & Trabasso, T. (1993). Inferences during reading: Converging evidence from discourse analysis, talk-aloud protocols, and recognition priming. Journal of Memory and Language, 32, ­279–300. ​­

248

Cognitive processes involved in text comprehension Walsh, E., Cook, A. E., & O’Brien, E. J. (2018). Processing real-world violations embedded within a fantasy world narrative. Quarterly Journal of Experimental Psychology, 71, ­2282–2294. ​­ Walsh, E., Cook, A. E., Wei, W., & O’Brien, E. J. (2022a). Fantasy-based violations of real-world knowledge. Manuscript in preparation. Walsh, E., Cook, A. E., Wei, W., & O’Brien, E. J. (2022b). Validating fantasy-based violations of real-world knowledge: The role of causality. Manuscript in preparation. Williams, C., Cook, A. E., & O’Brien, E. J. (2018). Validating semantic illusions during narrative comprehension. Journal of Experimental Psychology: Learning, Memory, and Cognition, 44, 1414–1429. ­ ​­ Van Den Broek, P., Rapp, D. N., & Kendeou, P. (2005). Integrating memory-based and constructionist processes in accounts of reading comprehension. Discourse Processes, 39, 299–316. ­ ​­ van den Broek, P., Risden, K., Fletcher, C. R., & Thurlow, R. (1996). A “landscape” view of reading: Fluctuating patterns of activation and the construction of a stable memory representation. In B. K. Britton & A. C. Graesser (Eds.), Models of Understanding Text (pp.165–187). Erlbaum. ­ ­ ​­ ­ Zwaan, R. A., & Radvansky, G. A. (1998). Situation models in language comprehension and memory. Psychological Bulletin, 123, ­162–185. ​­

249

16 ANALYSING READING WITH EYE TRACKING Anna Siyanova-Chanturia and Irina Elgort

16.1

Introduction

In the past 40 years, the eye-tracking methodology has proven to be instrumental in our gaining valuable insights into the nature of reading. Much of what we know today about the mechanisms of reading behaviour can be attributed to this methodology. This is in large part due to the many advantages this method affords. Eye movement data are unique in that they are believed to reflect moment-to-moment cognitive processes in reading (Rayner, 1998). They can tell us what was fixated and for how long, what was read just once or multiple times, what was skipped, or caused processing difficulties. A key advantage of this method is the ability to distinguish and analyse early versus late processing stages. As we will see in the review below, different effects – lexical-semantic, syntactic, discourse – may surface at different points in time. Because early and late measures are thought to tap into different processes, it is important to analyse both. As Rayner (1998) argues, any single measure is “a pale reflection of the reality of cognitive processing” (p. 377). Further, much unlike other psycholinguistic methods (e.g., response times and event-related brain potentials), eye movements allow the reader to engage in the task of normal reading (Rayner, 1998, 2009), with the reader proceeding entirely at their own pace. As Rayner (1998, p. 391) notes, a general problem with measures such as self-paced reading (a task in which participants move from one word to the next in the text by pressing a button on a keyboard or button box) is that reading rate is only about half as fast as it is in natural reading. In addition, in self-paced reading experiments, target stimuli are necessarily presented word-by-word, or portion-by-portion, while ­ ­​­­ ​­ ­ ­​­­ ​­ the eye movement methodology allows for entire sentences or larger pieces of discourse to be presented. No other methodology can offer such a flexibility in stimuli presentation or provide such a rich and direct account of one’s moment-to-moment reading behaviour.

16.2

Historical overview

While recording and analysing eye movements during reading may seem a relatively recent development, an interest in this methodology and its applications in reading research dates back to the second half of the 19th century. German scientists Hermann Helmholtz and Ewald Hering were some of the first to conduct experiments on eye movements during reading. They can be DOI: 10.4324/9781003392972-19 250

Analysing reading with eye tracking

credited with the invention (Helmholtz) and a subsequent improvement (Hering) of a bite bar, a device for controlling head movement to achieve greater precision and accuracy of eye movements recordings (for a review, see Wade & Tatler, 2011). Hering was one of the first to ‘observe’ eye movements during reading – he used rubber tubes placed on the eyelids to listen to the sounds of the ocular muscles, attributed to contractions of the oculomotor muscles that accompany eye movements (Wade & Tatler, 2011). This early era of eye movement research is also associated with the name of French ophthalmologists Luis Émile Javal and M. Lamare. Javal and Lamare observed that during reading, one’s ­ eyes make quick movements (saccades) to move from one word to the next, followed by brief pauses (fixations). It is believed that the term saccade was first used by Javal (but see Wade & Tatler, 2011 for a discussion of this issue), while Lamare is credited with the observation that during reading, the eyes continue to move from one fixation to the next. According to Wade and Tatler (2011), Hering and Lamare should be “accorded the credit for demonstrating the discontinuity of eye movements in reading” (p. 25). Early 20th century continued to see a lot of interest in the eye movement technology, with scientists developing eye-trackers that “used a lever attached to an eye cup to record movements of the eyes on the surface of a smoked drum” (Wade & Tatler, p. 25). As technology continued to develop, less intrusive eye-trackers were developed, which did not require direct attachment between the eye and the recording surface, for example, photographic devices that recorded light reflected from an eye cup, or devices that recorded light reflected directly from the surface of the eye (pioneered by Dodge, see Wade & Tatler, 2011). This early period, around 1879–1920, is known as the first era of eye movement research (Rayner, 1998). The second era, around 1930–1960, is associated with the seminal work by Tinker (1946) on reading, and Buswell (1935) on scene perception. However, as Rayner (1998) notes, this period also coincided with the behaviourist movement in experimental psychology which had a more applied focus and saw little work conducted into the nature of cognitive processes, and specifically of reading. This, however, radically changed in the third era of eye movement research, which began in the 1970s and saw technological advancements in eye movement recording systems, novel and more powerful methods of data collection and analysis, and the development of general theories and models of reading behaviour (Rayner, 1998).

16.3

Critical issues and concepts

During reading, we make extremely rapid eye movements – saccades followed by fixations  –​ when the eyes pause for exactly the time needed to encode information from the visual input. Because vision is suppressed during a saccade (Matin, 1974), no information is obtained during saccadic movements. Thus, the information about what is being read is acquired during fixations. On average, a skilled reader of English takes about a quarter of a second to read a word, while their mean saccade length (the length of the ‘jump’ from one fixation to the next) is about eight letters (Rayner, 2009). These indices, however, may be different for readers of other scripts. For example, average saccade length is shorter in Hebrew (about 5.5 letter spaces) than in English (Rayner, 2009). In Chinese,1 a more visually dense language compared with English, average saccade length is about 2–3 Chinese characters long, but when length in words is considered, an average saccade length in Chinese (1.7 words) and English (1.8 words) are comparable (e.g., Sun & Feng, 1999). As Zang et al. (2011) argue, while English and Chinese are visually highly dissimilar, and linguistic information in logographic languages is more densely packed than in alphabetic languages, major characteristics of readers’ eye movements are similar. 251

Anna ­Siyanova-­Chanturia and Irina Elgort

During reading, eyes do not relentlessly move forward. In languages where reading proceeds left to right (e.g., English and other European languages), about 10–15% saccades happen right to left, known as regressions (Rayner, 2009). Short regressive movements within a word are likely suggestive of processing difficulties specific to that word, while regressive movements outside of the word (e.g., to earlier portions of the text) tend to reflect comprehension failures or difficulties integrating larger discourse. Similar regression rates (proportion of movements in the direction opposite to the direction of reading) have been reported for Chinese and English (e.g., Rayner, 2009). Although the optimal language position for a saccade is in the middle of a word, readers’ eyes commonly land halfway between the middle and the beginning of a word (Rayner, 2009), with the landing position varying due to the saccade launch site on the preceding word and word length. Since saccadic movements are not always accurate, errors in landing positions are not uncommon. In some cases, saccades overshoot while others undershoot (McConkie et al., 1988). Estimates suggest that at least 10% of saccades may not land on the intended word (e.g., Engbert & Nuthmann, 2008); for example, single fixations at the end of a word are indicative of undershooting and, at the beginning of a word, overshooting. Interestingly, these errors characterise forward but not backward regressive saccades, which are typically accurate (e.g., Inhoff et al., 2005). After a nonoptimal landing, readers are more likely to make additional fixations (refixations) on that word. Much of reading research has been concerned with the reader’s visual field, the size of the perceptual span and the information extracted during a fixation. The size of the perceptual span in alphabetical languages is 3–4 characters to the left of a fixation and 14–15 characters to the right (e.g., McConkie & Rayner, 1975; Rayner et al., 1980), but the asymmetry is reversed for languages with right-to-left reading direction (e.g., Hebrew and Arabic). The size of the perceptual span decreases as text difficulty increases and vice versa (Henderson & Ferreira, 1990). The reader’s visual field can be divided into three regions: foveal, parafoveal, and peripheral (Rayner, ­ 1998). Visual acuity is best in the foveal region (the central 2 degrees of the visual field), worse in the parafovea (up to 5 degrees on either side of a fixation) and worst in the periphery (everything beyond the parafovea). The larger perceptual span in the direction of reading (in alphabetical languages) suggests that some information about the forthcoming word may be accessed in the parafovea. The parafoveal preview is important because it allows the reader to pre-process forthcoming information, enabling the reader to plan (programme) their eye movements. It has been shown that readers are able to extract orthographic, phonological, morphological, and even semantic information from the parafoveal region (for reviews, see Andrews & Veldre, 2019; Schotter et al., 2012).

16.4

Current contributions and research

16.4.1   Lexical- s  emantic influences on eye movements in reading 16.4.1.1

Frequency

Word processing in reading is affected by properties of the fixated word. Of specific interest, three lexical properties – frequency, length, and predictability – have been found to influence the number of fixations and fixation durations on a word, as well as the probability of skipping. Highfrequency words elicit fewer and shorter fixations than low-frequency words, an effect typically observed in the early reading time measures, such as first fixation duration (the duration of the first fixation on a word, provided it has not been skipped) and gaze duration (e.g., the sum of all fixation 252

Analysing reading with eye tracking

durations on a stimulus; Inhoff & Rayner, 1986; Rayner & Duffy, 1986). High-frequency words are also more likely to be skipped than low-frequency ones (e.g., Rayner et al., 1996). Lexical frequency effects are often used as a ‘benchmark’ experimental finding (common point of reference and comparison) in reading research (e.g., Juhasz & Pollatsek, 2011). Frequency effects have been studied both in alphabetic and logographic languages, such as Chinese (e.g., Yan et al., 2006; also see Zang et al., 2011). Although Chinese does not require spaces between words, frequency effects in English and Chinese reading have been found comparable (e.g., Sun & Feng, 1999). Probing frequency effects, Yan et al. (2006) found that Chinese readers took more time to read low-frequency words than high-frequency ones. They also found that Chinese character frequency affected readers’ fixation times on two-character words (which are very common, Packard, 2000), with the effect being stronger on the first character than the second. Frequency has been found to affect not just the reading of the target word, but the reading of the following region as well. For example, the time spent on a low-frequency word n has been found to spill over onto the word that follows it n+1, increasing the reading time on n+1 (Rayner & Duffy, 1986; White, 2008). It is, thus, customary to analyse several areas of interest, including the region immediately following the target.

16.4.1.2

Length

While frequency is one of the key properties known to affect lexical processing, according to Rayner (1998, 2009), length is the key determinant in word skipping (see also Brysbaert et al., 2005). Rayner and McConkie (1976) showed that 2–3 letter words are fixated around 25% of the time, while eight and more letter words are nearly always fixated and may elicit multiple fixations. Similarly, Carpenter and Just (1983) found that content words are fixated about 85% of the time, while function words, which tend to be shorter and more frequent, are fixated only about 35% of the time. It is important to note that when readers skip a word, it does not necessarily mean that it has not been read. Rather, the skipped word n is likely to be identified and processed on the prior fixation ­ ​­ (Rayner, 2009; Rayner & Duffy, 1988), which can be indicated by an inflated fixation duration n-1 ­ ​­ (compared to when n is not skipped, e.g., Pollatsek et al., 1986). Generally, longer words on n-1 take longer to process, as evidenced by the number of refixations affecting the reading time on the word (e.g., Clifton et al., 2016). Word length affects not only the number of fixations and their durations but also saccade planning and execution, that is, where to move the eyes next. Experiments employing the boundary paradigm (i.e., a technique that allows the researcher to analyse the information available to the reader in the parafovea, e.g., Rayner, 1975) have shown that the length of the parafoveal word affects next saccade planning (e.g., White et al., 2005).

16.4.1.3

Predictability

Word predictability in a sentential context has also been found to modulate reading times on a word, as well as the probability of this word being skipped. More predictable words are more likely to be skipped, and, if not skipped, they elicit shorter and fewer fixations than less predictable words (e.g., Balota et al., 1985; Ehrlich & Rayner, 1981; for a review, see Staub, 2015). The benefit of a parafoveal preview is also greater when the word is predictable than when it is not aided by sentential context (Rayner, 1998). Similar to frequency effects discussed above, predictability effects in reading have been observed both in spaced alphabetic languages, such as English (e.g., Balota et al., 1985; Ehrlich & Rayner, 1981) and non-alphabetic languages. Rayner 253

Anna ­Siyanova-­Chanturia and Irina Elgort

et al. (2005) analysed eye movements on high, medium, and low-predictability Chinese words and found that readers spent less time processing and were less likely to fixate high and medium predictability words than low-predictability words. The effect of predictability is typically evidenced in the early reading time measures (see below for more information on early and late eye-tracking measures), such as first fixation duration and gaze duration, as well as in increased skipping rates ­   2015). (Staub, It is not only general sentence level constrains that modulate reading times on a word. Final word/s within conventional phrases (known as multiword expressions, e.g., idioms, collocations, binomials) have also been found to elicit fewer and shorter fixations than the same words in control phrases (Jiang et al., 2020; for an overview, see Pellicer-Sanchez & Siyanova-Chanturia, 2018). For example, the words cover, groom, and pain are highly predictable in the phrases you can’t judge a book by its cover, bride and groom, and excruciating pain. Thus, both ­sentence-​­ and phrase-level constrains influence the probability of a word being skipped (i.e., processed in the parafovea) or processed with less effort compared to the same word in a less predictable context.

16.4.1.4

Beyond the “big three”

Frequency, length, and predictability (the “big three”, p. 756, Rayner & Liversedge, 2011) are arguably the most powerful predictors of a word being skipped, as well as the number and durations of fixations on a word. However, other properties have also been found to affect eye movement patterns, albeit to a lesser extent. These are familiarity (e.g., Williams & Morris, 2004), ­age-of­​­­ ​ ­acquisition (e.g., Juhasz & Rayner, 2003), plausibility (e.g., Rayner et al., 2004), and lexical ambiguity, defined as the number of word meanings a word has (e.g., Duffy et al., 1988). Words that are less familiar, acquired later in life, are implausible (e.g., violate the reader’s real-world knowledge), or ambiguous (e.g., have multiple meanings), generally elicit longer reading times than familiar, earlier acquired, semantically plausible, and unambiguous ones (e.g., words with fewer meanings). Of note is that familiarity is sometimes referred to as subjective measure of word frequency, with some suggesting that familiarity may be a better measurement of frequency of occurrence than (objective) corpus frequency (e.g., Juhasz & Pollatsek, 2011; also see Coltheart, 1981 for a description of MRC psycholinguistics database that can be used to obtain familiarity ratings). Eye movements have also been used to probe orthographic neighbourhood effects and the role of neighbourhood size, which is the number of words that can be produced by changing one letter in a word (Coltheart et al., 1977). During reading, not only the target word is activated but also orthographically similar words (i.e., its orthographic neighbours). Pollatsek et al. (1999) found longer first-pass reading time (early measure) and longer total reading time (late measure) for English words with more orthographic neighbours compared to words with fewer neighbours (embedded in sentences). The inhibitory role of orthographic neighbours was further replicated in Paterson et al. (2009). Interestingly, the opposite – a facilitative effect of neighbourhood size – was observed for Chinese sentence reading, where words with more neighbours (neighbourhood size was defined as the number of two-character words sharing the same initial constituent character) elicited shorter reading times and higher-skipping rates than words with fewer neighbours (e.g., Tsai et al., 2006).

16.4.2  Morphological influences on eye movements in reading A relatively large body of research has looked at the processing of morphologically complex words during reading. Studies employing eye movements have shown that both word frequency 254

Analysing reading with eye tracking

and individual morpheme frequency affect reading time measures on the target word. In particular, studies have looked at the processing of compounds, words composed of morphemes that are also words (e.g., blackboard, timetable, football). These studies have shown that the frequencies of both constituents (aka lexemes) affect fixation durations made on the compound. For example, Hyönä and Pollatsek (1998) found that the frequency of the first lexeme affected the first fixation duration (early measure, see below) on the whole compound in Finnish. It also affected later measures. On the contrary, the frequency of the second lexeme affected the reading of the whole compound in the late measures only. Based on these results, Hyönä and Pollatsek (1998) put forward a dual-route model, wherein both the compound and its constituents are accessed in parallel, with the first lexeme being accessed prior to the second lexeme. Similar findings were reported for English with the frequency of the first lexeme affecting compound processing early on, and the frequency of the second lexeme affecting compound processing in later measures (e.g., Juhasz, 2007; also see Kuperman et al., 2008, 2009). A characteristic of compound words that has received some attention in the literature is semantic transparency. Compounds whose constituents are related to the overall meaning of the compound are known as transparent (e.g., blackboard, football). Compounds whose overall meaning ­ ­ cannot be inferred from the meaning of their constituents are known as opaque (e.g., hotdog, deadline). Unlike the robust and consistent findings across different languages reported for constituent frequency (see above), studies into the role of semantic transparency in compound processing have yielded conflicting results. For example, while Pollatsek and Hyönä (2005) found no effect of compound type (transparent versus opaque), in Juhasz (2007) transparent compounds were read more quickly than opaque compounds as evidenced in the late, gaze duration measure. More recently, Marelli and Luzzatti (2012) explored the role of semantic transparency and headedness (whether most of the meaning is contained in the first or final constituent) in the processing of Italian compounds. A facilitatory frequency effect was found for transparent and head-final compounds, suggesting that the information about both the compound and its constituent parts is accessed during compound processing. Importantly, the effect of semantic transparency was found at the very early stages of processing (first fixation duration). Overall, the studies on the processing of morphologically complex words – compounds and prefixed and suffixed words – support their decompositional nature, with both the parts and the whole activated and accessed. However, their properties (e.g., frequency of the parts and the whole) may affect the relative degree of their activation and prominence.

16.4.3  Sentence and discourse influences on eye movements in reading As noted by Clifton and Staub (2011), the use of eye movements in sentence processing can be traced back to the beginning of cognitive psychology, or at least the 1970s. In particular, many studies have used eye movements to investigate parsing and sentence interpretation (for a review, see Clifton et al., 2007). In a landmark study, Frazier and Rayner (1982) examined the comprehension of temporarily ambiguous sentences. The study led to the “garden-path” theory of sentence processing, with two types of ambiguity proposed: “late closure” and “minimal attachment”. An example of late closure is exemplified in the following sentence: Since Jay always jogs a mile (this) seems like a short distance to him (example from Frazier & Rayner, 1982). The reader is likely to interpret a mile as a direct object of the verb jogs. However, in the absence of this, the reader must reanalyse the sentence so that a mile is correctly interpreted as the subject of the main clause agreeing with the verb seems. In this case, it is said that late closure is violated. The minimal attachment strategy can be illustrated in The city council argued the mayor’s position 255

Anna ­Siyanova-­Chanturia and Irina Elgort

forcefully/The city council argued the mayor’s position was incorrect (example from Frazier & Rayner, 1982). Frazier and Rayner (1982) argued that the ambiguous phrase the mayor’s position should be interpreted as the direct object of the verb argue, rather than as the subject of a sentential complement (p. 180). Frazier and Rayner (1982) observed shorter reading times for sentences that conformed with readers’ strategies than for sentences that violated them. Thus, eye movements were disrupted (as evidenced by longer fixation durations and regressive eye movements) when the reader was forced to reanalyse ambiguous sentences. Importantly, this study showed that language comprehension is incremental in nature, an idea that has since been widely examined and accepted (e.g., see Clifton et al., 2007). Another factor that has received attention in the literature is syntactic prediction. It has been argued that it is not only the predictability of lexical items that matters (see above), but also the predictability of syntactic structure (e.g., Clifton & Staub, 2011; Staub & Clifton, 2006; also see Favier et al., 2021 who examined syntactic prediction using the visual world paradigm). Staub and Clifton (2006) had participants read sentences in which two noun phrases or two clauses were connected by or (e.g., ­ The team took [either] the train or the subway to get to the game./ [Either] John borrowed a rake or his wife bought one), with either present or absent in the earlier part of the sentence. When either was present, both types of structures were read faster than when either was absent. It was also found that readers tended to misanalyse the clause structure as the noun-phrase structure, but only in the absence of either. The presence of either eliminated the misanalysis. The authors concluded that the reader is able to use available information to anticipate the upcoming syntactic structure; when the expectation is not met, the reading slows down. Clifton and Staub (2011) make an interesting point regarding the timing of lexical versus syntactic effects during reading. They argue that while lexical effects, such as frequency and predictability, affect the time it takes to read a word in a uniform fashion (as evidenced in the early measures, such as first fixation duration and gaze duration), effects of syntactic processing can surface at various points in the eye-tracking record, affecting both early and late reading ­­   901). measures  (p.

16.4.4

Beyond syntax

Eye movements are a critical tool for developing and testing models of how larger portions of discourse are processed (Clifton et al., 2016). Yet, there have been relatively few eye-tracking studies examining online comprehension (i.e., comprehension happening in real time) and discourse processing effects (e.g., also see Rayner, 2009). Two lines of referential processing research have been probed: anaphora resolution and the processing of discourse containing focus operators. During anaphora resolution, the distance between a pronoun/noun (anaphor) and its antecedent (the previous mention of the referent) can vary, while the reader needs to correctly link an antecedent that matches the pronoun/noun. In one of the earliest such studies, Duffy and Rayner (1990) showed that the anaphoric noun received shorter fixations when its antecedent was close and typical of nouns than when it was distant and atypical of nouns. Duffy and Rayner (1990) concluded that anaphora resolution is initiated but not completed while the reader processes the anaphoric noun. Rather, it is completed only after the eyes have moved off the noun (see Ehrlich & Rayner, 1983 for comparable results with pronouns). Another line of research into higher-order linguistic processing has probed the influence of focus on eye movements during reading (e.g., see Filik et al., 2011 for a review). In particular, researchers have looked at the role of focus operators in the processing of structural ambiguities (e.g., Filik et al., 2005; Liversedge et al., 2002; Paterson et al., 1999, 2007). These studies have 256

Analysing reading with eye tracking

examined whether the presence of the focus operator only at the beginning of the sentence can prevent syntactic misanalysis and ease comprehension difficulty during the reading of ambiguous sentences such as The teenagers allowed a party invited a juggler straightaway/ Only teenagers allowed a party invited a juggler straightaway (example from Paterson et al., 1999). By and large, the results of these studies suggest that while the presence of only does not prevent a misanalysis, it does lead to less disruption to comprehension processes when compared to when it is absent. This effect is evident in late eye-tracking measures and the number of regressions and re-readings (e.g., Liversedge et al., 2002; Paterson et al., 1999). These studies suggest that not only lexical or syntactic (see above) processes, but also higherorder cognitive ones, such as the online construction of a discourse representation, affect eye movement behaviour during reading (Rayner & Liversedge, 2011, p. 759).

16.5

Main research methods

A key advantage of using the eye-tracking method in reading experiments is that it affords a measure of real-time language processing that does not involve additional tasks requiring explicit decision-making (e.g., lexical decisions, acceptability judgements). This eliminates the artefacts associated with the task completion and allows researchers to study the lexical, semantic, and morphosyntactic processing of interest in a more direct and natural way. Due to the high accuracy and precision of modern eye-trackers, which are capable of recording eye movements at a rate of up to 2,000 frames per second (e.g., EyeLink 1,000 Plus by SR Research), researchers are able to study the time-course of visual text processing that goes far beyond recording overall reading times for individual words and phrases (e.g., such as that possible in experiments using the self-paced reading paradigm, in which participants’ reading times on individual words are recorded, as they push a button to progress from one word of the text to the next). This high time-course precision is of critical importance in studies that investigate component processes of reading, including word identification, morphological and semantic processing, parsing morpho-syntactic structures, constructing a representation of the text and relating it to the reader’s general knowledge about the world. This is because different eye movement measures can be associated with an ongoing language processing by the reader (Engbert et al., 2005; Just & Carpenter, 1980). It is well documented, for example, that readers tend to make more fixations, fixate for longer, and make more regressive eye movements (eye movement in the direction opposite to the reading direction) when they experience difficulties at different stages of language processing and reading comprehension (Rayner, 2009).

16.5.1 What eye movement measures are used in reading research? Multiple eye movement measures can be extracted from the recorded reading data (for a taxonomy of eye-tracking measures, see Godfroid, 2020, p. 207, Figure 7.1). These measures are used as dependent variables in studies of lower- and higher-order cognitive processes in reading. Eye movement measures can be grouped into: (1) continuous or duration measures (i.e., fixation durations, dwell time, i.e., total time spent on the current interest area in the current interest period) and (2) event or count measures (e.g., fixation and regression counts, likelihood of word skipping, number and length of saccades). Both types of eye movement measures may be described as early or late measures (sometimes a distinction is made between early, intermediate, and late measures, e.g., Conklin et al., 2018, pp. 66–67). Using multiple early and late eye movement measures (Chaffin et al., 2001) in reading research creates a more comprehensive picture of the time-course 257

Anna ­Siyanova-­Chanturia and Irina Elgort

of processing in reading (Rayner, 2009). Below we consider different types of measures and their proposed associations with cognitive processes in reading. Before we proceed, however, it is important to point out that, like many other measures of language processing, the connection between eye movements and cognitive processes cannot be observed directly. Therefore, in eye movement studies, reading researchers must base their a priori predictions and interpretations of the findings on relevant theoretical frameworks of reading (and language and information processing, more generally) and on computation models of reading (e.g., E-Z reader, Reichle et al., 1998; SWIFT, Engbert et al., 2005), that model empirical evidence from existing reading studies. Godfroid and Hui (2020, pp. 283–289), for example, emphasise the importance of selecting and handling eye movement measures in a principled manner, and Boers (2022, p. 13) warns against making direct connections between what is being attended to during reading and “what precisely went on in the reader ’s mind”. Duration measures (for single fixations or sums of fixations) are by far the most common measures in reading research. For single words, early processing measures include first-fixation duration, single fixation duration, and gaze duration (Staub & Rayner, 2007). These measures are commonly associated with automatic lexical access. First-fixation duration is the duration of the first fixation on a word, provided it has not been skipped. First-fixation duration reflects early stages of word recognition and lexical access (the programming of which starts in the parafoveal preview with an orthographic familiarity check; Reichle et al., 1998). Gaze duration is the sum of all fixation durations on an item or area/region of interest (ROI) before the eyes move outside of that ROI (see 16.6.1.1 for further information on ROI). Gaze duration is considered to be a measure of full lexical access, which is influenced by such item characteristics as word frequency, length, and word predictability (Rayner et al., 2004; Reichle et al., 1998). Both first-fixation duration and gaze duration on novel words in reading are longer compared with highly familiar words but not compared with less familiar words; this suggests that measures of lexical access are affected by the degree of word familiarity (Chaffin, 1997). When a word is the main unit of analysis, first-fixation duration and gaze duration are the measures of choice in reading research (Rayner, 1998). Late duration measures (Staub & Rayner, 2007) have been used in studies of meaning processing and word-to-text integration, that is, how word meanings are integrated into context during reading (Rayner & Pollatsek, 2006; Reichle et al., 1998). Late measures are also used to detect post-lexical processing difficulties, such as difficulties associated with processing semantic or syntactic ambiguity and anomalies in a ROI (Clifton et al., 2007; Reichle et al., 2009). Total reading time (or dwell time), a widely used late measures, is the sum of all fixation durations on a ROI. It comprises both lower- and higher-order processing times. It is also considered to be indicative of the amount of attention paid to the ROI by the reader. Another measure that is sometimes considered a late measure is ­go-past time (also referred to as regression path duration). It is the time ​­ from first fixating on the word until a fixation is made outside of the word in the direction of reading (including the time on regressions back to the preceding text). This measure is interpreted as a late processing measure because it may reflect an effort to overcome difficulties in word-to-text integration (for example, when readers encounter a semantically anomalous word in a sentence). However, go-past time has also been associated with early processing because it reflects difficulties of integrating a word when it is initially fixated (Rayner & Pollatsek, 2006). Event measures can, similarly, be associated with early and late processes of reading comprehension (although some researchers argue that the early-late distinction is only applicable to continuous measures, e.g., Godfroid, 2020). Skipping (i.e., proportion of words that are not directly fixated during the first pass) or likelihood of skipping (i.e., skipped versus non-skipped) is considered an early measure because skipped words are pre-processed in parafovea. The skipping 258

Analysing reading with eye tracking

behaviour is programmed when the reader’s focal attention is on the preceding word. The number ­­ ​­ and the total number of fixations on a word (fixation of regressions back to a word (regressions-in) count) are considered late processing measures. They can be associated with difficulties in accessing and processing word meanings, integrating word meanings into the meaning text, or the timecourse of ambiguity resolution by the reader (Rayner, 2009). To put it simply, words are re-fixated to be more fully understood. Fixation count and total reading time on a ROI (the two measures that are normally highly correlated) are also indicative of the amount of attention paid to that ROI by the reader (see 16.6.1.1 for more on ROI). In measuring ROIs larger than a single word (usually 3–4 words), researchers distinguish be​­ tween the first-pass reading time (all forward fixations within the region) and ­second-pass reading time (or rereading, i.e., the sum of all ROI fixations following the first pass). For multiword ROIs, other measures commonly used in reading studies are go-past time, total reading time, and ­ ​­ regressions-out (i.e., regressing out a ROI during the first-pass reading). However, whether to count fixations in the ROI made after regressions back to an earlier part of the text (but before the eyes move beyond the ROI in the direction of reading) as part of the first pass or the second pass is still a matter of debate (Rayner, 1998; Rayner & Pollatsek, 2006).

16.6 16.6.1

Recommendations for practice

Designing an eye-tracking study of reading

In designing an eye-tracking study of reading, most of the considerations from behavioural language processing experiments are applicable. In word reading studies, for example, one or two characteristics of the critical items are deliberately manipulated, while other characteristics that may affect reading are controlled for. For example, to investigate whether word frequency and length (which are highly correlated) affect word reading independently, researchers either control frequency and record eye movement measures for longer and shorter words (Rayner & Fischer, 1996), or they keep word length constant but manipulate frequency (e.g., Rayner et al., 1996). In addition, in studies that predict an effect of the experimental manipulations not only on the critical word but also on the spill-over region (one or two words after the critical word), the spill-over words also need to be matched on these characteristics. Although eye movement studies can be designed without any additional (secondary) tasks, since the primary task, reading, provides all the necessary fine-grained evidence needed for hypothesis testing or observational data analysis, it is nevertheless often desirable to include a secondary task that facilitates participants’ engagement with the text and ensures that they are taking reading seriously. The most common design approach is to use reading comprehension questions, after all or some trials (of passage or sentence reading), or at the end of chapters for longer reading studies (e.g., Chaffin et al., 2001; Cop et al., 2015; Elgort et al., 2018). Other possible secondary tasks include grammaticality and acceptability judgements. The data from secondary tasks are needed primarily to verify readers’ engagement in the primary task (rather than, for example, to be used as another outcome variable). The verification may result in the exclusion of participants who were not doing the task correctly.

16.6.1.1

Region of interest

Defining ROI is an important component of eye-tracking experiment design that needs to be determined prior to the data collection. Studies that investigate the reading of single words often 259

Anna ­Siyanova-­Chanturia and Irina Elgort

define the critical words (and sometimes the spillover words) as ROIs. For studies with multiword expressions (for review, see Pellicer-Sanchez & Siyanova-Chanturia, 2018), multiple overlapping ROIs may be defined. For example, there may be one ROI for the whole expression (spread the wings) and additional ROIs for component words (for example, one ROIs for “spread the” and another ROI for “wings”). In experiments manipulating semantic and syntactic ambiguity, one ROI may be defined as an area where ambiguity occurs, and additional ROIs are also defined at the disambiguation regions in the sentence (such as, grammatically, or semantically related words or structures needed to resolve ambiguity, e.g., Raynor et al., 2004; Traxler & Pickering, 1996). Having multiple ROIs in syntactic and semantic ambiguity studies (Karimi & Ferreira, 2016) is particularly important to understand how ambiguity is resolved in reading. Although processing of an ambiguous region is studied using late eye movement measures, how this ambiguity is resolved by the reader is reflected in early measures (e.g., first-fixation durations or first-pass reading times) on the disambiguation ROIs (Rayner & Pollatsek, 2006). In studies that investigate meaning processing across the sentence boundary, ROIs may be positioned in two consecutive sentences. In studies that pose question about higher-order processes in reading (e.g., studies investigating reading behaviour after different types of instruction or training), a whole section, paragraph, or the whole text may be defined as a single ROI. When running eye-tracking experiments, researchers need to consider the position of ROI on the screen and in the text. A key advice here is to avoid the outer edges of the screen, where eye movement recordings tend to be less accurate. Another recommendation is to avoid the start and end of a line of text. Readers’ fixations are commonly located five to seven letter spaces from the end and beginning of the line (Rayner, 2009), resulting in devious reading pattern on the first and the last word. In multiline texts, when readers move from the end of one line of text to the beginning of the next line, the start of a line is also prone to the fixation location error due to the return sweep. Readers often fixate on a nonoptimal position in the first word on the line and may even miss the first word altogether (i.e., undershoot, Rayner & Pollatsek, 2006). Another undesirable ROI position in reading studies is on the terminal word of a sentence. This is because readers may dwell longer on the final word (i.e., the sentence ­wrap-up ​­ effect) for several reasons beyond processing the word itself. These reasons may be related to the need to update and integrate intra- and inter-sentence information (Just & Carpenter, 1980), affected by different degrees of clause, sentence, or text complexity, and implicit prosody (Hirotani et al., 2006; Warren et al., 2009). Because a high degree of precision is needed when recording eye movements in reading studies, greater line spacing is often used in multiline texts presented to participants in eye-tracking studies than in general reading (Kliegl & Laubrock, 2018). A bigger gap is helpful when dealing with vertical drift, that is, when participants’ fixations no longer fall in the middle of the line of text, they are reading but appear either above or below the line. The drift may occur halfway through reading a page (screen) of text, when stopping and re-calibrating the equipment would significantly disrupt comprehension. Having double-spaced (or even triple-spaced) lines makes it easier to create larger ROIs (to capture relevant fixations), and spot and correct the drift cases at the data cleaning and processing stages. Prior to the data analysis, researchers need to complete the data cleaning stage that involves a close inspection of the recordings for individual participants and making necessary adjustments (e.g., drift correction, exclusion of trials with significant data loss, or poor data quality, for example, due to participants not following instructions). Next, the eye movement data for the measures identified at the design stage as outcome variables are extracted for the specified ROIs for the data analysis. 260

Analysing reading with eye tracking ­Table  16.1 Three experimental paradigms (based on Rayner, 1998, Figure 2, p. 379) Example

Paradigm

During a saccade because the eyes are moving so *

Normal text Asterisk indicates the fixation location

XXXXXX X XXXcade because the XXXX XXX XXXXXX XX * XXXXXX X XXXXXXX XXXXXse the eyes are mXXXXX XX * During a saccade XXXXXXX the eyes are moving so * During a saccade becausXXXXXXXyes are moving so * During a saccade because the dogs are moving so * During a saccade because the eyes are moving so *

Moving window Two successive fixations with a window of 17­letter spaces

16.6.2

Foveal mask Two successive fixations with a 7-letter mask Boundary When the reader’s eye movement crosses an invisible boundary location (the letter e in ­ the), an initially displayed word (dogs) is ­ replaced by the target word (eyes).

Common experimental paradigms in eye movement reading research

Several gaze contingent experimental paradigms that manipulate the visual display have been developed, as eye-tracking technology progressed (see Hyönä & Kaakinen, 2019, for an overview). They include moving window (McConkie & Rayner, 1975), foveal mask, and boundary paradigms (Rayner, 1998; Table 16.1). The original goal of deploying these paradigms was to research perceptual span in reading. They are also used to better understand the types of information available to readers beyond the focal fixation, in parafovea, and how (un)availability of this information affects the processing of the foveal (fixated) word. The two key paradigms to study how readers access parafoveal information are the moving window paradigm and the boundary paradigm (Warren et al., 2009).

16.6.2.1

Gaze contingent moving window paradigm

In a moving window paradigm (McConkie & Rayner, 1975), the text is masked in some way, except in an experimenter-defined window region around the point of fixation. Therefore, where the reader looks, the text is intact, but elsewhere it is masked. The window size is manipulated to establish the region from which the reader can obtain information (Rayner, 1998). A moving mask paradigm (Rayner & Bertera, 1979) is a mirror opposite of the moving window. The text around each reader’s fixation is masked, with the rest of the text remaining normal beyond the mask region (for a review of the gaze contingent moving window in reading, see Rayner, 2014).

16.6.2.2

Gaze contingent boundary paradigms

Readers’ perceptual span has also been investigated in experiments with boundary paradigms (Rayner, 1975), also known as gaze contingent boundary paradigm (GCBP). In this paradigm, a 261

Anna ­Siyanova-­Chanturia and Irina Elgort

critical word is initially replaced by another word, a nonword, or a mask. When the reader’s eyes cross over a prespecified location in the text (an invisible boundary), the initially displayed stimulus is replaced by the critical word (Rayner, 1998). The timing of this stimulus switch is contingent on the point in time when a reader’s saccade crosses the invisible boundary. During the saccade, vision is suppressed, which precludes readers from noticing the change. The type of information available from the replacement stimulus can be deliberately manipulated to test what information can be accessed by readers from upcoming parafoveal words in fluent reading, and how it contributes to reading (Andrews & Veldre, 2019). When the replacement stimulus is an orthographically illegal nonword, for example, it has been reported that readers fixate longer on the critical word compared with normal reading (i.e., without the parafoveal preview disruption). This is known as the preview benefit effect (Rayner, 1998, 2009; Schotter et al., 2012). Using GCBP, Veldre and Andrews (2015) showed that higher-skilled readers experience greater preview benefits than less skilled readers, concluding that skilled readers are better able to extract lexical information from a parafoveal word.

16.7

Future directions

The above overview of the use of eye movements in reading, albeit brief, points to several directions for future research. For example, while lexical, semantic, and syntactic influences on eye movements in reading have received a lot of attention in psycholinguistic literature of the past thirty years, far less research has been done into higher-order cognitive process, such as discourse processing, and how it may affect the eye movement behaviour. Further, most studies have looked at the reading of alphabetic languages (those that use the Roman script, in particular). As noted by Zang et al. (2011), relatively little research has employed eye movements during reading of non-alphabetic languages, despite such languages being widely spoken (e.g., Chinese; also see Radach & Kennedy, 2013). Future research should thus focus on languages that use a variety of scripts and writing conventions. Of note is that some languages, for example, Japanese and Chinese, allow for different writing directions, such as, right to left and top to bottom. Yet, other languages use multiple scripts – for example, Bosnian, Serbian, and Uzbek use both Latin and Cyrillic. It would be valuable to investigate if the presence of different writing conventions and scrips have an effect on the reading behaviour of the speakers of these languages. Another research direction that needs attention is understanding eye movements of bilingual readers in their dominant and non-dominant (second) languages, and the relationship between reading behaviour and reading comprehension. In a pioneering multi-site study, Kuperman et al. (2022) recorded reading comprehension and eye movements of over 500 participants of 12 different languages reading the same texts in their second language, English (see Siegelman et al., 2022, for a parallel study on the multilingual eye movements of these participants reading in their first language). However, within-participant comparisons of eye movements of bilinguals reading in their languages are still rare. Cop et al. (2015) recorded eye movements of unbalanced bilinguals reading a novel by Agatha Christie (56,000 words) in their first language (Dutch) and second language (English). They found longer sentence reading times, more fixations, shorter saccades, and less word skipping in the second than first language reading. Future eye movement studies could collect and share eye movement data from first and second language readers of less commonly researched languages with different scripts and writing systems. Eye movement corpora (i.e., eye-tracking data from multiple readers and/or texts, such as MECO L2, Kuperman et al., 2022; MECO L1, Siegelman et al., 2022; the Dundee corpus of English and French, Pynte & Kennedy, 262

Analysing reading with eye tracking

2006; and the GECO corpus, Cop et al., 2017) will play a critical role in future studies of reading behaviour within and across languages. The role of individual differences in eye movement behaviour during reading has received considerable attention in the past two decades. Studies have probed developmental mechanisms behind language processing in children (e.g., Blyth & Joseph, 2011; also see the special issue by Schroeder et al., 2015) and reading behaviour in individuals with language disabilities, such as dyslexia (e.g., Jones et al., 2008). Despite the proliferation of research into the role of individual differences, some important and widely researched issues with healthy adult readers have to date received very little attention in studies with children and readers with dyslexia (among other disorders). One such phenomenon is anticipatory mechanisms in language processing. As was noted above, predictability is one of the “big three” (Rayner & Liversedge, 2011, p. 756) factors affecting reading times and word skipping in reading (e.g., Staub, 2015). Yet, it has received almost no attention in the studies involving children and individuals with reading disabilities. Evidence is extremely scarce as to how such readers exploit prediction, with one eye-tracking study showing impairment in predicting meaning among dyslexics (Huettig & Brouwer, 2015; but see Egan et al., 2022).

Note 1 Most of reading research using eye movements has been conducted on European alphabetic languages, while relatively few studies have been carried out on non-alphabetic languages. A relatively well studied non-alphabetic language has been Chinese, which we will draw on in this chapter alongside English, where relevant.

Further reading Conklin, K., Pellicer-Sánchez, A., & Carrol, G. (2018). ­Eye-Tracking: ​­ A Guide for Applied Linguistics Research. Cambridge University Press. Dussias, P. (2010). Uses of eye-tracking data in second language sentence processing research. Annual Review of Applied Linguistics, 30, ­149–166. ​­ Godfroid, A. (2020). Eye Tracking in Second Language Acquisition and Bilingualism: A Research Synthesis ­ and Methodological Guide. Routledge. Liversedge, S. P., Gilchrist, I., & S. Everling (Eds.). (2011). The Oxford Handbook of Eye Movements. Oxford University Press. Pellicer-Sanchez, A., & Siyanova-Chanturia, A. (2018). Eye movements in vocabulary research. ­ITL -​­ International Journal of Applied Linguistics, 169(1), ­ ­5–29. ​­ Rayner, K. (1998). Eye movements in reading and information processing: 20 years of research. Psychological Bulletin, 124, ­372–422. ​­ Rayner, K. (2009). The 35th Sir Frederick Bartlett Lecture: Eye movements and attention during reading, scene perception, and visual search. Quarterly Journal of Experimental Psychology, 62, ­1457–1506. ​­ Roberts, L., & Siyanova-Chanturia, A. (2013). Using eye-tracking to investigate topics in L2 acquisition and L2 sentence and discourse processing. Studies in Second Language Acquisition, 35(2), ­ ­213–235. ​­ Schroeder, S., Hyönä, J., & Liversedge, S. P. (Eds.). (2015). Developmental eye-tracking research in reading [Special issue]. Journal of Cognitive Psychology, 27(5). ­

Related topics New directions in statistical analysis for experimental linguistics; historical perspectives on the use of experimental methods in linguistics; testing in the lab and testing through the Web; contrasting online and offline measures; examples from experimental research on linguistic relativity; controlling social factors in experimental linguistics

263

Anna ­Siyanova-­Chanturia and Irina Elgort

References Andrews, S., & Veldre, A. (2019). What is the most plausible account of the role of parafoveal processing in reading? Language and Linguistics Compass, 13(7), ­ e12344. Balota, D., Pollatsek, A., & Rayner, K. (1985). The interaction of contextual constraints and parafoveal visual information in reading. Cognitive Psychology, 17(3), ­ ­364–390. ​­ Blyth, H., & Joseph, H. S. S. L. (2011). Children’s eye movements during reading. In S. P. Liversedge, I. Gilchrist & S. Everling (Eds.), The Oxford Handbook of Eye Movements (pp. Oxford University ­­  ­644–662). ​­ Press. Boers, F. (2022). Glossing and vocabulary learning. Language Teaching, 55(1), ­ 1–23. ­ ​­ Brysbaert, M., Drieghe, D., & Vitu, F. (2005). Word skipping: Implications for theories of eye movement ­­  ­53–77). ​­ control in reading. In G. Underwood (Ed.), Cognitive Processes in Eye Guidance (pp. Oxford University Press. Buswell, G. T. (1935). How people look at pictures. Chicago: University of Chicago Press. Carpenter, P., & Just, M. (1983). What your eyes do while your mind is reading. In K. Rayner (Ed.), Eye ­­  ­275–307). ​­ Movements in Reading: Perceptual and Language Processes (pp. Academic Press. Chaffin, R. (1997). Associations to unfamiliar words: Learning the meaning of new words. Memory and ​­ Cognition, 25, ­203–226. Chaffin, R., Morris, R. K., & Seely, R. E. (2001). Learning new word meanings from context: A study of eye ­ 225. movements. Journal of Experimental Psychology: Learning, Memory, and Cognition, 27(1), Clifton, C. Jr., & Staub, A. (2011). Syntactic influences on eye movements during reading. In Liversedge, ­­  ­896–909). ​­ S. P., Gilchrist, I. & S. Everling (Eds.), The Oxford Handbook of Eye Movements (pp. Oxford University Press. Clifton, C. Jr., Staub, A., & Rayner, K. (2007). Eye movements in reading words and sentences. In R. Van Gompel, M. Fisher, W. Murray & R. L. Hill (Eds.), Eye Movement Research: A Window On Mind and Brain (pp. Elsevier. ­­  ­341–372). ​­ Clifton, C. Jr., Ferreira, F., Henderson, J., Inhoff, A. W., Liversedge, S., Reichle, E., & Schotter, E. R. (2016). Eye movements in reading and information processing: Keith Rayner’s 40 year legacy. Journal of Memory and Language, 86, ­1–19. ​­ Coltheart, M. (1981). The MRC psycholinguistic database. Quarterly Journal of Experimental Psychology, 33A, ­497–505. ​­ Coltheart, M., Davelaar, E., Jonasson, J. T., & Besner, D. (1977). Access to the internal lexicon. In S. Dornic ­ ­­  ­535–555). ​­ (Ed.), Attention and Performance VI (pp. Lawrence Erlbaum. Conklin, K., Pellicer-Sánchez, A., & Carrol, G. (2018). ­Eye-Tracking: ​­ A Guide for Applied Linguistics Research. Cambridge University Press. Cop, U., Dirix, N., Drieghe, D., & Duyck, W. (2017). Presenting GECO: An eyetracking corpus of monolin­ 602–615. ­ ​­ gual and bilingual sentence reading. Behavior Research Methods, 49(2), Cop, U., Drieghe, D., & Duyck, W. (2015). Eye movement patterns in natural reading: A comparison of ­ e0134008. monolingual and bilingual reading of a novel. PLoS ONE, 10(8), Duffy, S. A., & Rayner, K. (1990). Eye movements and anaphor resolution: Effects of antecedent typicality and distance. Language and Speech, 33(2), ­ 103–119. ­ ​­ Duffy, S. A., Morris, R. K., & Rayner, K. (1988). Lexical ambiguity and fixation times in reading. Journal of ­ ­429–446. ​­ Memory and Language, 27(4), Egan, C., Siyanova-Chanturia, A., Warren, P., &, Jones, M. W. (2022). As clear as glass: How figurativeness and familiarity impact idiom processing in readers with and without dyslexia. Quarterly Journal of Experimental Psychology, 76(2), ­ ­231–247. ​­ Ehrlich, S. F., & Rayner, K. (1981). Contextual effects on word recognition and eye movements during reading. Journal of Verbal Learning and Verbal Behavior, 20(6), ­ 641–655. ­ ​­ Ehrlich, K., & Rayner, K. (1983). Pronoun assignment and semantic integration during reading: Eye movements and immediacy of processing. Journal of Verbal Learning and Verbal Behavior, 22(1), ­ 75–87. ­ ​­ Elgort, I., Brysbaert, M., Stevens, M., & Van Assche, E. (2018). Contextual word learning during reading in a second language: An eye-movement study. Studies in Second Language Acquisition, 40(2), ­ ­341–366. ​­ Engbert, R., Nuthmann, A., Richter, E. M., & Kliegl, R. (2005). SWIFT: A dynamical model of saccade generation during reading. Psychological Review, 112(4), ­ ­777–813. ​­ Engbert, R., & Nuthmann, A. (2008). Self-consistent estimation of mislocated fixations during reading. PLoS One, 3(2), ­ e1534.

264

Analysing reading with eye tracking Favier, S., Meyer, A. S., & Huettig, F. (2021). Literacy can enhance syntactic prediction in spoken language ­ processing. Journal of Experimental Psychology: General, 150(10), 2167. Filik, R., Paterson, K. B., & Liversedge, S. P. (2005). Parsing with focus particles in context: Evidence from ­ ­473–495. ​­ eye movements in reading. Journal of Memory and Language, 53(4), Filik, R., Paterson, K. B., & Sauermann, A. (2011). The influence of focus on eye movements during reading. ­­  ­926–​ In S. P. Liversedge, I. Gilchrist & S. Everling (Eds.), The Oxford Handbook of Eye Movements (pp. 241). Oxford University Press. Frazier, L., & Rayner, K. (1982). Making and correcting errors during sentence comprehension: Eye move­ ­178–210. ​­ ments in the analysis of structurally ambiguous sentences, Cognitive Psychology, 14(2), ­ Godfroid, A. (2020). Eye Tracking in Second Language Acquisition and Bilingualism: A Research Synthesis and Methodological Guide. Routledge. Godfroid, A., & Hui, B. (2020). Five common pitfalls in eye-tracking research. Second Language Research, ­ ­277–305. ​­ 36(3), Henderson, J. M., & Ferreira, E (1990). Effects of foveal processing difficulty on the perceptual span in reading: Implications for attention and eye movement control. Journal of Experimental Psychology: Learning, ­ ­417–429. ​­ Memory, and Cognition, 16(3), Hirotani, M., Frazier, L., & Rayner, K. (2006). Punctuation and intonation effects on clause and sentence ­ ­425–443. wrap-up: Evidence from eye movements. Journal of Memory and Language, 54(3), ​­ Huettig, F., & Brouwer, S. (2015). Delayed anticipatory spoken language processing in adults with dyslexia – Dyslexia, 21(2), ­evidence from ­eye-tracking. ​­ ­ ­97–122. ​­ Hyönä, J., & Kaakinen, J. K. (2019). Eye movements during reading. In C. Klein & U. Ettinger (Eds.), Eye Movement Research. Studies in Neuroscience, Psychology and Behavioral Economics. Springer. Hyönä J., & Pollatsek, A. (1998). Reading Finnish compound words: Eye fixations are affected by component morphemes. Journal of Experimental Psychology: Human Perception and Performance 24(6), ­ ­1612–1627. ​­ Inhoff, A. W., & Rayner, K. (1986). Parafoveal word processing during eye fixations in reading: Effects of word frequency. Perception and Psychophysics, 40(6), ­ ­431–439. ​­ Inhoff, A. W., Weger, U. W., & Radach, R. (2005). Sources of information for the programming of short- and long-range regressions during reading. In G. Underwood (Ed.), Cognitive Processes in Eye Guidance (pp. Oxford University Press. ­­  ­33–52). ​­ Jiang, S., Jiang, X., & Siyanova-Chanturia, A. (2020). Multi-word expression processing in children: An eye­tracking study. Applied Psycholinguistics, 41(4), ­ ­901–931. ​­ Jones, M. Obregón, M., Kelly, M. L., & Branigan, H. P. (2008). Elucidating the component processes involved in dyslexic and non-dyslexic reading fluency: An eye-tracking study. Cognition, 109(3), ­ ­389–407. ​­ Juhasz, B. J., & Pollatsek, A. (2011). Lexical influences on eye movements in reading. In S. P. Liversedge, I. Gilchrist & S. Everling (Eds.), The Oxford Handbook of Eye Movements (pp. ­­  ­874–893). ​­ Oxford University Press. Juhasz, B. J., & Rayner, K. (2003). Investigating the effects of a set of intercorrelated variables on eye fixation durations in reading. Journal of Experimental Psychology: Learning, Memory and Cognition, 29, ­ ​­ 1312–1318. Juhasz, B. J. (2007). The influence of semantic transparency on eye movements during English compound word recognition. In R. van Gompel, M. Fischer, W. Murray & R. Hill (Eds.), Eye Movements: A Window on Mind and Brain (pp. Elsevier. ­­  ­373–389). ​­ Just, M., & Carpenter, P. (1980). A theory of reading: From eye fixations to comprehension. Psychological Review, 87(4), ­ ­329–354. ​­ Karimi, H., & Ferreira, F. (2016). Informativity renders a referent more accessible: Evidence from eyetracking. Psychonomic Bulletin & Review, 23(2), ­ 507–525. ­ ​­ Kliegl, R., & Laubrock, J. (2018). Eye-movement tracking during reading. In A. M. B. de Groot & P. Hagoort (Eds.), Research Methods in Psycholinguistics and the Neurobiology of Language: A Practical Guide ­ (pp. Wiley Blackwell. ­­  ­68–88). ​­ Kuperman, V., Bertram, R., & Baayen, R. H. (2008). Morphological dynamics in compound processing. Language and Cognitive Processes, 23(7–8), ­­ ​­ 1089–1132. ­ ​­ Kuperman, V., Schreuder, R., Bertram, R., & Baayen, R. H. (2009). Reading polymorphemic Dutch compounds: Toward a multiple route model of lexical processing. Journal of Experimental Psychology, 35(3), ­ 876–895. ­ ​­ Kuperman, V., Siegelman, N., Schroeder, S., Acartürk, C., Alexeeva, S., Amenta, S., Bertram, R., Bonandrini, R., Brysbaert, M., Chernova, D., Da Fonseca, S. M., Dirix, N., Duyck, W., Fella, A., Frost, R., Gattei, C.,

265

Anna ­Siyanova-­Chanturia and Irina Elgort Kalaitzi, A., Lõo, K., Marelli, M., … Usal, K. A. (2022). Text reading in English as a second language: Evi­ ­3–37. ​­ dence from the multilingual eye-movements corpus. Studies in Second Language Acquisition, 45(1), Liversedge, S. P., Paterson, K. B., & Clayes, E. L. (2002). The influence of only on syntactic processing of ­ ­225–240. ​­ ʻlongʼ relative clause sentences. Quarterly Journal of Experimental Psychology, 55(1), Marelli, M., & Luzzatti, C. (2012). Frequency effects in the processing of Italian nominal compounds: Modu­ ­644–664. ​­ lation of headedness and semantic transparency. Journal of Memory and Language, 66(4), ­ ­899–917. ​­ Matin, E. (1974). Saccadic suppression: A review and analysis. Psychological Bulletin, 81(12), McConkie, G., & Rayner, K. (1975). The span of the effective stimulus during a fixation in reading. Percep­ ­578–586. ​­ tion and Psychophysics, 17(6), McConkie, G. W., Kerr, E W., Reddix, M. D., & Zola, D. (1988). Eye movement control during reading: I. ­ ­1107–1118. ​­ The location of initial eye fixations in words. Vision Research, 28(10), Packard, J. L. (2000). The Morphology of Chinese: A Linguistic and Cognitive Approach. Cambridge University Press. Paterson, K. B., Liversedge, S. P., & Davis, C. (2009). Inhibitory neighbor priming effects in eye movements ­ ­43–50. ​­ during reading. Psychonomic Bulletin & Review, 16(1), Paterson, K. B., Liversedge, S. P., & Underwood, G. (1999). The influence of focus operators on syntactic ­ processing of short relative clause sentences. The Quarterly Journal of Experimental Psychology, 52(3), ­717–737. ​­ Paterson, K. B., Liversedge, S. P., Filik, R., Juhasz, B. J., White, S. J., & Rayner, K. (2007). Focus identification during sentence comprehension: Evidence from eye movements. Quarterly Journal of Experimental ­ ­1423–1445. ​­ Psychology, 60(10), Pellicer-Sanchez, A., & Siyanova-Chanturia, A. (2018). Eye movements in vocabulary research. ­ITL  –​ ­ ­5–29. ​­ International Journal of Applied Linguistics, 169(1), Pollatsek, A., & Hyönä, J. (2005). The role of semantic transparency in the processing of Finnish compound ­­ ​­ ­261–290. ​­ words. Language and Cognitive Processes, 20(1–2), Pollatsek, A., Perea, M., & Binder, K. (1999). The effects of “neighborhood size” in reading and lexical ­ ­1142–1158. ​­ decision. Journal of Experimental Psychology: Human Perception and Performance, 25(4), Pollatsek, A., Rayner, K., & Balota, D. A. (1986). Inferences about eye movement control from the perceptual ­ ­123–130. ​­ span in reading. Perception and Psychophysics, 40(2), Pynte, J., & Kennedy, A. (2006). An influence over eye movements in reading exerted from beyond the level ­ ­3786–3801. ​­ of the word: Evidence from reading English and French. Vision Research, 46(22), Radach R., & Kennedy, A. (2013). Eye movements in reading: Some theoretical context. The Quarterly Jour­ ­429–452. ​­ nal of Experimental Psychology, 66(3), ­ ­271–282. ​­ Rayner, K. (1975). Parafoveal identification during a fixation in reading. Acta Psychologica, 39(4), Rayner, K. (1998). Eye movements in reading and information processing: 20 years of research. Psychologi­ ­372–422. ​­ cal Bulletin, 124(3), Rayner, K. (2009). The 35th Sir Frederick Bartlett Lecture: Eye movements and attention during reading, ­ ­1457–1506. ​­ scene perception, and visual search. Quarterly Journal of Experimental Psychology, 62(8), Rayner, K. (2014). The gaze-contingent moving window in reading: Development and review. Visual Cogni­­ ​­ ­242–258. ​­ tion, 22(3–4), Rayner, K., Ashby, J., Pollatsek, A., & Reichle, E. D. (2004). The effects of frequency and predictability on eye fixations in reading: Implications for the E-Z Reader model. Journal of Experimental Psychology: ­ ­720–732. ​­ Human Perception and Performance, 30(4), ­ ­468–469. ​­ Rayner, K., & Bertera, J. H. (1979). Reading without a fovea. Science, 206(4417), Rayner, K., & Duffy, S. (1986). Lexical complexity and fixation times in reading: Effects of word frequency, ­ ­191–201. ​­ verb complexity, and lexical ambiguity. Memory and Cognition, 14(3), Rayner, K., & Duffy, S. A. (1988). On-line comprehension processes and eye movements in reading. In M. Daneman, G. E. MacKinnon & T. G. Waller (Eds.), Reading Research: Advances in Theory and Practice ­­  ­13–66). ​­ (pp. Academic Press. Rayner, K., & Fischer, M. H. (1996). Mindless reading revisited: Eye movements during reading and scan​­ ning are different. Perception & Psychophysics, 58, ­734–747. Rayner, K., Li, X., Juhasz, B. J., and Yan, G. (2005). The effect of word predictability on the eye movements ­ ­1089–1093. ​­ of Chinese readers. Psychonomic Bulletin and Review, 12(6), Rayner K., & Liversedge, S. P. (2011). Linguistic and cognitive influences on eye movements during reading In S. P. Liversedge, I. Gilchrist & S. Everling (Eds.), The Oxford Handbook of Eye Movements (pp. ­­  ­752–​ 766). Oxford University Press.

266

Analysing reading with eye tracking Rayner, K., & McConkie, G. W. (1976). What guides a reader’s eye movements. Vision Research, 16(8), ­ ­829–837. ​­ Rayner, K., & Pollatsek, A. (2006). Eye-movement control in reading. In M. J. Traxler & M. A. Gernsbacher (Eds.), ­ Handbook of Psycholinguistics (2nd ed.) (pp. Academic Press. ­ ­­  ­613–658). ​­ Rayner, K., Sereno, S. C., & Raney, G. E. (1996). Eye movement control in reading: A comparison of two types of models. Journal of Experimental Psychology: Human Perception and Performance, 22(5), ­ 1188. Rayner, K., Warren, T., Juhasz, B. J., & Liversedge, S. P. (2004). The effects of plausibility on eye movements in reading. Journal of Experimental Psychology: Learning, Memory, and Cognition, 30(6), ­ ­1290–1301. ​­ Rayner, K., Well, A., & Pollatsek, A. (1980). Asymmetry of the effective visual field in reading. Perception and Psychophysics, 27, ­537–544. ​­ Reichle, E. D., Pollatsek, A., Fisher, D. L., & Rayner, K. (1998). Toward a model of eye movement control in reading. Psychological Review, 105(1), ­ ­125–157. ​­ Reichle, E. D., Warren, T., & McConnell, K. (2009). Using EZ Reader to model the effects of higher level language processing on eye movements during reading. Psychonomic Bulletin & Review, 16, ­1–21. ​­ Schotter, E. R., Angele, B., & Rayner, K. (2012). Parafoveal processing in reading. Attention, Perception, & Psychophysics, 74(1), ­ ­5–35. ​­ Schroeder, S., Hyönä, J., & Liversedge, S. P. (Eds.). (2015). Developmental eye-tracking research in reading [Special issue]. Journal of Cognitive Psychology, 27(5), ­ ­500–683. ​­ Siegelman, N., Schroeder, S., Acartürk, C., Ahn, H. D., Alexeeva, S., Amenta, S.,... & Kuperman, V. (2022). Expanding horizons of cross-linguistic research on reading: The multilingual eye-movement corpus (MECO). Behavior Research Methods, 54, ­2843–2863. ­ ​­ Staub, A., & Clifton, C. Jr. (2006). Syntactic prediction in language comprehension: Evidence from either… or. Journal of Experimental Psychology: Learning, Memory and Cognition, 32(2), ­ ­425–36. ​­ Staub, A. (2015). The effect of lexical predictability on eye movements in reading: Critical review and theoretical interpretation. Language and Linguistics Compass, 9(8), ­ ­311–327. ​­ Staub, A., & Rayner, K. (2007). Eye movements and on-line comprehension processes. In G. Gaskell (Ed.), ­­  ­327–342). ​­ The Oxford Handbook of Psycholinguistics (pp. Oxford University Press. Sun, F., & Feng, D. (1999). Eye movements in reading Chinese and English text. In J. Wang, A. W. Inhoff & ​­ Chen (Eds.), ­ ­­  ­189–206). ​­ H.-C. Reading Chinese Script: A Cognitive Analysis (pp. Lawrence Erlbaum. Tinker, M. A. (1946). The study of eye movements in reading. Psychological Bulletin, 43(3), ­ 93–120. ­ ​­ Traxler, M., & Pickering, M. (1996). Plausibility and the processing of unbounded dependencies: An eyetracking ­ study. Journal of Memory and Language, 35(3), ­ 542–562. ­ ​­ Tsai, J. L., Lee, C. Y., Lin, Y. C., Tzeng, O. J. L., & Hung, D. L. (2006). Neighborhood size effects of Chinese words in lexical decision and reading. Language and Linguistics, 7(3), ­ 659–675. ­ ​­ Veldre, A., & Andrews, S. (2015). Parafoveal lexical activation depends on skilled reading proficiency. Journal of Experimental Psychology: Learning, Memory, and Cognition, 41(2), ­ 586–595. ­ ​­ Wade, N. J., & Tatler, B. W. (2011). Origins and applications of eye movement research. In S. P. Liversedge, I. Gilchrist & S. Everling (Eds.), The Oxford Handbook of Eye Movements (pp. Oxford University ­­  ­18–43). ​­ Press. Warren, T., White, S. J., & Reichle, E. D. (2009). Investigating the causes of wrap-up effects: Evidence from eye movements and E-Z Reader. Cognition, 111(1), ­ ­132–137. ​­ White, S. (2008). Eye movement control during reading: Effects of word frequency and orthographic familiarity. Journal of Experimental Psychology: Human Perception and Performance, 34(1), ­ ­205–223 ​­ White, S. J., Rayner, K., & Liversedge, S. P. (2005). The influence of parafoveal word length and contextual constraint on fixation durations and word skipping in reading. Psychonomic Bulletin & Review, 12(3), ­ ­466–471. ​­ Williams, R. S., & Morris, R. K. (2004). Eye movements, word familiarity, and vocabulary acquisition. European Journal of Cognitive Psychology, 16(1–2), ­­ ​­ ­312–339. ​­ Yan, G., Tian, H., Bai, X., & Rayner, K. (2006). The effect of word and character frequency on the eye movements of Chinese readers. British Journal of Psychology, 97(2), ­ ­259–268. ​­ Zang C., Liversedge, S., Bai, X., & Yan, G. (2011). Eye movements during Chinese reading. In S. P. Liversedge, I. Gilchrist & S. Everling (Eds.), The Oxford Handbook of Eye Movements (pp. Oxford ­­  ­962–978). ​­ University Press.

267

17 ANALYSING SPOKEN LANGUAGE COMPREHENSION WITH EYE TRACKING Yipu Wei and Michael K. Tanenhaus

17.1

Introduction and definitions

Spoken language unfolds as a series of rapid transient acoustic events, which listeners rapidly map onto internal representations. While there are ongoing theoretical debates and unresolved empirical questions, the following characterization of the comprehension process represents a consensus in the field: at multiple levels or types of linguistic representation, comprehension is closely time-locked to the input. As the speech signal arrives, listeners generate and update provisional hypotheses about words, syntactic structures, referring expressions and even the intentions that a speaker is seeking to convey with an utterance in a specific context (i.e., speaker meaning), while making predictions about upcoming input. Many of these processes can be best studied using experimental methods that allow researchers to analyse comprehension as it takes place in real time. Monitoring participants’ eye movements to real or depicted entities—often clip art displays of people and objects—as they listen to spoken language, in what is commonly referred to as the ‘Visual World Paradigm’ (VWP), has emerged as one of the most useful experimental methods in experimental linguistics. The visual workspace defines the referential domain that the spoken language is about, thus the term ‘visual world’ coined by Tanenhaus and colleagues. An important feature of the VWP is its versatility. It can be used in a range of natural (goalbased) tasks, with minimal restrictions and with most populations, including infants, elderly adults, bilinguals, and special populations (e.g., aphasics, individuals with autism). It is perhaps the paradigm of choice for examining sentence processing in pre-literate children. Most generally, it can be used to study the gamut of topics in spoken language comprehension at multiple levels, ranging from phonetic to pragmatic processing. In Section 17.4, we briefly outline some of the most common applications. Eye movements can be used to analyse spoken language because shifts in visual attention are typically accompanied by rapid ballistic eye movements called saccades, which bring the attended region into foveal vision. When the language is about the display or workspace, shifts in attention to relevant regions of the display are often closely time-locked to the unfolding speech. With appropriate experimental designs, this enables experimenters to gain insights and test hypotheses about ongoing comprehension processes. DOI: 10.4324/9781003392972-20 268

Analysing spoken language comprehension with eye tracking

While the most general form of the linking hypothesis for language-mediated eye movements are attentional shifts, there are ongoing questions and alternative proposals about more specific linking hypotheses (Altmann & Mirković, 2009; Knoeferle & Crocker, 2006; Salverda et al., 2011). The typical equipment is a device, typically a video-based eye-tracker that samples the position of the eye considering the position of the head. In contrast to reading studies, researchers rarely examine when a saccade is initiated and to where the eye moves. Rather, they examine where the participant is looking (fixating) during specific segments of the language as it unfolds. The experimenter defines regions of interest in a display and anchors looks to objects in the display to one or more acoustic landmarks in the speech stream. With simple displays, for example, a display with four pictures, changes in looking patterns are observed as rapidly as 200 ms after the earliest acoustic information that could, if used by listeners, provide probabilistic evidence about the target (Salverda & Tanenhaus, 2018). A potential referent that is temporarily consistent with the unfolding utterance is often termed a competitor. For example, if the spoken sentence was The boy will look at the candle, and the display contained a candle and a candy, the candle would be the target and the candy would be the competitor. In visual world experiments, the data typically reveal when in the speech stream looks to the target and different types of competitors diverge. Depending upon the topic of interest, the focus might be on the effects of a potential acoustic cue, information in a type of word (e.g., a verb, preposition, or type of article), the knowledge states and intentions of the speaker, and the actionrelevant affordances of the potential referents. The term anticipatory eye movements refers to looks that occur to the target before it is mentioned, especially in studies which focus on the timing of predictions in comprehension. In some types of studies, for example, studies using pronouns as referring expressions, experimenters are interested not only in the time course of fixations but also in what the participant chooses as the referent. Looks can then be analysed separately for different choices (response-contingent analyses). ­­ ​­ The remainder of this chapter provides a roadmap about the history (Section 17.2), critical topics (Section 17.3), and contributions (Section 17.4) of visual world eye tracking. We also provide an overview of methodological issues (Section 17.5), recommendations for practice (Section 17.6), and thoughts on future directions (Section 17.7). The paradigm can also be used to analyse language production but that is beyond the scope of this chapter.

17.2

Historical perspectives

The application of eye tracking in language research is closely tied with technological development. Before the interface of eye-tracking devices with computers, eye movements could only be measured with photographic records and scored manually (Mackworth, 1968; Mackworth & Morandi, 1967). The mid-1970s witness tremendous changes brought about by the application of computers, one of which is the online recording and scoring of eye movements’ data (Kundel & Nodine, 1973; Loftus et al., 1975; McConkie, 1976; McConkie & Rayner, 1973). Online eye tracking with a computer interface enabled precise control of stimulus presentations and improved the accuracy of calibration and efficiency of data recording (Just & Carpenter, 1976; McConkie & Rayner, 1975; Young & Sheena, 1975). Identifying what is being fixated requires either stabilizing the head or combining accurate measures of head position with the position of the eye in its orbit. The development of DualPurkinje Image (DPI) eye trackers with millisecond samples and spatial resolution that could

269

Yipu Wei and Michael K. Tanenhaus

identify which letter in a word was being fixated was an important catalyst for eye movements’ studies of reading. Reading studies using DPI trackers typically stabilized the head with a bite bar, moulded for each participant. By combining ingenious manipulations of the display contingent on when a saccade was initiated, Rayner and his group conducted a stream of influential studies that provided the foundations of eye-tracking research in reading (see a review by Rayner, 1998). In a pioneering study using a DPI tracker, Cooper (1974) presented participants with visual objects arranged in a 3×3 array as they heard a spoken story containing words with different grammatical functions. Listeners fixated on visual objects related to the meaning of the words—either directly referred to by the word (e.g., the words lion—the image of a lion) or indirectly inferred from the word (e.g., the words Africa—the image of a lion or a zebra), often initiating to targets before the word was completed. Few investigators built upon Cooper’s work. Monitoring eye movements with a DPI tracker was cumbersome and expensive. Stabilizing the head was more problematic for listening studies than for reading. Moreover, except for reading, eye movements were not widely used in research on cognition and perception. After initial excitement about classic research by Yarbus (1967) on how goals affected scan patterns in scene perception, it became evident that without clear task constraints and explicit hypotheses, eye movements were difficult to interpret. In the early 1990s, Land, and then Hayhoe and Ballard, began to measure eye movements in natural tasks, finding remarkable coupling of eye movements to task goals (for review see Hayhoe & Ballard, 2005). This research was facilitated by the development of video-based tracker that was inexpensive and light enough to be head-mounted, thus eliminating the need for head stabilization. Influenced by this work, Tanenhaus et al. (1995) used a head-mounted video-based eyetracker to monitor participants’ eye movements as they followed experimenter-generated spoken instructions to pick up and move objects arranged on a table (e.g., Put the apple that is on the towel in the box). Tanenhaus et al. found evidence for rapid integration of visual and linguistic information in word recognition, reference resolution, and syntactic processing (parsing). The latter was the focus of their report, setting the stage for widespread use of the VWP. Several studies provided the foundation for the application of the VWP to important issues in language comprehension. In the first screen-based study, Allopenna et al. (1998) examined the time course of spoken-word recognition in continuous speech, providing the foundation for most subsequent research that uses the VWP to study speech and lexical processing. Trueswell et al. (1999) demonstrated that the VWP could be used to study sentence comprehension in young children, using a variant of the setup in Tanenhaus et al. (1995). This has led to flourishing literature in children’s sentence processing. Many current VWP studies follow the methods and rationale introduced by Cooper (1974), who did not use an explicit task. Altmann and Kamide (1999) is the foundational ‘look-and-listen’ study. They presented displays (see Figure 17.1) with clipart of a person (e.g., a boy) and a set of four objects (e.g., a cake, a toy car, a ball, and a toy train) and a spoken utterance, e.g., The boy will eat the cake. Participants were more likely to generate a saccade to the target object, as the verb unfolded when the semantics of the verb were consistent with only one of the objects. This study set the stage for the use of the VWP to study predictive processing. Reprinted from Cognition, 17/3, Gerry T.M. Altmann & Yuki Kamide, Incremental interpretation at verbs: restricting the domain of subsequent reference, 247–264, Copyright (1999), with permission from Elsevier. Arnold et al. (2000) was the first study to use the VWP to investigate reference resolution. Sedivy et al. (1999) is the foundational application of this paradigm to pragmatics: upon hearing a prenominal scalar adjective such as tall (e.g., ­ Touch the tall glass), listeners would assume that 270

Analysing spoken language comprehension with eye tracking

Figure 17.1

Reprinted from Cognition, 17/3, Gerry T.M. Altmann & Yuki Kamide, Incremental interpretationat verbs: restricting the domain of subsequent reference, 247–264, Copyright (1999), with permission from Elsevier.

the speaker was referring to an object in a contrast set of two glasses, one of which is taller than the other. Keysar et al. (2000) created the framework for examining differences in perspective, for example, objects that the listener but not the speaker can see, and thus would not be plausible referents.

17.3

Critical issues and topics

Visual world eye-tracking has made multiple contributions to our understanding of spoken language by providing a dependent measure that can be used with continuous speech in co-present visual contexts, including natural tasks, and in a wide range of populations, including children. Crucially, it has high-temporal resolution and is sensitive to fine-grained acoustic information. In what Clark (1992) labelled the ‘language-as-action’ tradition, language is grounded in the time, place, and situation in which an utterance occurs. The visual world paradigm has allowed spoken language processing studies to be grounded, revealing a system that rapidly integrates multiple constraints, including effects of the information in the visual context and task goals (Spivey & Huette, 2016), which is crucial for investigating issues in experimental pragmatics. The VWP has facilitated studies of fine-grained acoustic variation in continuous speech. This has proved invaluable in studies of spoken language processing, informing questions about temporal integration of fine-grained asynchronous cues across multiple linguistic levels, and enabling a wider range of studies on prosody and intonation. 271

Yipu Wei and Michael K. Tanenhaus

Many contributions of the VWP stem from it being a referential paradigm, with the display providing the full set of potential referents to be simultaneously available. This has proved important to studies of pronouns and other referring expressions, studies of information determining how listeners circumscribe referential domains, and in allowing experimenters to test hypotheses about predictions that listeners make about upcoming input, e.g., using information in the utterance to anticipate what is likely to be referred to, and in some cases revising those predictions. More generally, the VWP encourages experimenters to study different aspects of language comprehension as part of an integrated system in which linguistic processing at multiple levels is tightly coupled and integrated with perception and action in the context of goal-direct behaviour. In the next section, we briefly discuss several subfields of language comprehension where the VWP has become a central methodological tool and has helped resolve long-standing empirical and theoretical issues, and/or served as a catalyst for new directions of research.

17.4

Current contributions and research

The literature using the VWP in language comprehension is so voluminous and covers so many subfields that it is difficult to do even a cursory review. Moreover, it is misleading to focus selectively on eye movement studies as the paradigm has become so well established that it is embedded in the broader literature. Consequently, we have selected six domains in which monitoring eye movements has had a major impact, highlighting the contributions of these studies and focusing on questions in each domain where the paradigm has provided valuable insights. This should provide a useful guide for readers, who can then consult recent handbook reviews in each domain (Rueschemeyer & Gaskell, 2018, for example).

17.4.1

Spoken word recognition and speech perception

In spoken language comprehension, the mapping of the input onto lexical representations takes place continuously as the input unfolds over time. Moreover, recognition of a word is strongly influenced by the family of words to which it is related, often called a neighbourhood. For example, recognition of the word beaker is affected by the number of words that contain similar sounds, e.g., beeper, beagle, peeper, and peek. The VWP has made it possible to observe temporal dynamics of potential lexical candidates as a word unfolds, typically by examining looks to targets and different types of competitors. It has also allowed researchers to examine spoken word recognition in continuous speech, which is important because words are articulated differently at different positions in an utterance. Most generally, visual world studies have contributed to eliminating artificial boundaries between speech perception and spoken word recognition, and revealing a system that is much more dynamic, making and revising provisional hypotheses about possible phonemes, syllables, words, and even phrases. It has been especially useful in studying the real-time use of fine-grained acoustic cues, anticipatory coarticulatory information, and the temporal integration of asynchronous cues, which is important because relevant cues often arrive asynchronously. For example, the same sequence may be heard as saw, saw a or saw uh depending on the rate of the preceding and following speech. Initial concerns that the paradigm would be limited because processing might be restricted to the small number of words on the screen were allayed by studies demonstrating that the VWP was sensitive to effects of frequency and non-displayed competitor effects, e.g., neighbourhood density (Magnuson et al., 2007).

272

Analysing spoken language comprehension with eye tracking

Other eye-tracking studies have demonstrated language users’ sensitivity to lexical stress, duration of acoustic sequences, pitch accents, and lexical tones in languages such as Chinese. It is widely used in studying individual differences in both adults and children, special populations, such as specific language-impaired populations, and issues in bilingualism.

17.4.2

Syntactic parsing

Throughout most of the 1980s and 1990s, a large body of research in syntactic processing (parsing) focused on how readers resolve the ubiquitous temporary ambiguities that arise as a sentence is read. There were ongoing debates about whether initial syntactic processing was encapsulated from various types of information that could, in principle, resolve a syntactic ambiguity, for example, lexical information, plausibility, and the preceding discourse context, and whether the system makes probabilistic use of multiple constraints. The VWP made it possible to examine a wider range of contexts, including visual information, which unlike prior context in text, was co-present with the input. The results, which showed effects of referential context, e.g., multiple referents where post-nominal modification is necessary to disambiguate the referent, and affordances of objects and task goals strongly supported the constraint-based approach with both adults and children (Tanenhaus ­   & Brown-Schmidt, ­ ​­ 2008). In subsequent parsing research, syntactic ambiguity resolution has taken a back seat to studies of predictive processing, where the VWP is widely used, and studies of processing difficulty where it is not well suited.

17.4.3

Semantic integration

The VWP, with its fine temporal grain, is widely used in studies of how comprehenders integrate semantic meanings of various linguistic elements and other kinds of information from the visual environment and world knowledge. Semantic processing is not delayed until all information is available, but rather immediate or even predictive, as shown by anticipatory looks to highly probable referents in the visual display. For example, the verb drink marked in future tense restricts the domain of referents to something drinkable—a glass full of beer, while has drunk predicts an empty glass (Altmann & Kamide, 2007). The subject the man together with the verb will ride jointly predict a highly plausible object motorbike before the object is linguistically available, in contrast to the girl will ride (Kamide et al., 2003). Predictive processing extends beyond verb-argument integration. Both L1 and L2 listeners use Chinese classifiers to anticipate a forthcoming noun (Grüter et al., 2020). Comprehenders also integrate knowledge-driven expectations with the semantic information of classifiers in real time (Chow & Chen, 2020). The VWP has also been used to examine how thematic role assignment is influenced by visual context. For example, a German verb-second sentence (e.g., Die Prinzessin malt … ‘The princess paints…’) could be temporally ambiguous at the verb in terms of thematic roles because agent/patient ­ the princess can either be the agent performing the paint action or the patient who is being painted. But when the processing of such sentences was situated in a scene depicting an agent-actionthis visual information could immediately facili­patient event (e.g., ­ ­fencer-painting-princess), ­​­­ ​­ tate the resolution of the role ambiguity before the rest of the sentence was disclosed (Knoeferle et al., 2005).

273

Yipu Wei and Michael K. Tanenhaus

17.4.4

Discourse processing

The VWP eye tracking has benefited research on discourse processing in at least two regards. First, it offers a tool to examine how coherence is established between an anaphor and its antecedents in prior discourse through referential processing. Gazes towards a visual entity instructed by a linguistic anaphor indicate that this entity is regarded as a potential referent (Arnold et al., 2000). Eye movement patterns identify differences between various types of referential devices, such as pronouns and demonstratives (e.g., it and that) which are difficult to examine without having copresent alternatives. Brown-Schmidt et al. (2005) showed that It tends to refer to the most salient referent in ­discourse—the a cup), as indicated by increased fixations to the cup upon ​­ Theme (e.g., ­ participants hearing it, while that preferentially refers to a composite, i.e., a conceptually complex entity (e.g., a cup on a saucer; cp. Kaiser & Trueswell, 2008; Kaiser et al., 2009). ­ VWP studies also provide insights into how and when comprehenders interpret ambiguous pronouns given contextual cues from implicit causality verbs, which may bias pronoun interpretation Peter because… versus John to a particular entity in prior discourse (e.g., John annoyedNP1-bias ​­ verb Peter because …). A VWP study by Pyykkönen and Järvikivi (2010) reveals direct fearedNP2-bias ​­ verb evidence supporting an incremental account of referential processing in discourse—a growing proportion of looks to the verb-biased entity was found immediately after the implicit causality verb, indicating an early activation of implicit causality information to resolve pronoun ambiguity (see also Cozijn et al., 2011). Second, visual displays depicting scenes can be used to evaluate how comprehenders reason about the relations between events in discourse representations. For example, van Veen (2011) compared the comprehension of objective relations (consequence-cause) and subjective relations (claim-argument) by children and adults. For the objective relation condition, the consequence event was described by a spoken sentence (e.g., The pig gets dirty) and a cause event was depicted in a visual scene (e.g., a picture of a pig standing in the mud); for the subjective relation condition, a verbally expressed claim (e.g., The man is tired) was accompanied by an event in the visual scene (e.g., a picture of a man lying in bed) which served as an argument for the verbal claim. Crucially, the VWP can be used with both adults and children, revealing important developmental trends. Adults and children aged 3;6 look more to the target scene than the irrelevant scene in both objective and subjective conditions—suggesting an understanding of the two types of relations. For the 2-year-old, however, such preferential looks to the target were only observed in the objective relation condition, showing difficulties in comprehending subjective relations.

17.4.5

Pragmatics and communication

The VWP has played a central role in experimental pragmatics. Much of the information conveyed in an utterance is implicit, requiring listeners to make inferences to infer speaker meaning. There are long-standing issues about when and how listeners make pragmatic inferences, whether or not there is an initial stage in sentence processing in which literal meaning is computed prior to pragmatic meaning. For example, speakers typically use pre-nominal scalar adjectives when there are multiple objects of the same semantic type that differ along the dimension referred to by the adjective. Pioneering work by Sedivy et al. (1999) used the visual world paradigm to show that listeners assume that there is contextually salient contrast set when they hear a prenominal adjective. For example, if a display has two glasses that differ in size, listeners upon hearing tall will initially look at the glass, even if there is another tall object in the display. 274

Analysing spoken language comprehension with eye tracking

No topic has been more central to experimental pragmatics than scalar implicature (Degen & Tanenhaus, 2019). In many contexts, the phrase some balloons implies not all balloons because an informative speaker would otherwise use the more informative quantifier, all. Such pragmatic implicature, according to classic works on pragmatics, requires inference and thus, takes extra time to generate and is more difficult for children to master. There is an ongoing debate about whether listeners compute the logical meaning of some (some and possibly all) before the pragmatic some (some but not all), which has increasingly used visual world studies because referents associated with a logical and a pragmatic interpretation are accessible as listeners hear the quantifier (Grodner & Sedivy, 2011; Huang & Snedeker, 2009). Eye movement data also indicate that over-informativeness comes with a cost during real-time processing, even though sentences with over-modification are not rated less acceptable than more concise utterances in acceptability judgments (e.g., Engelhardt et al., 2006). Face-to-face, interactive conversation is the primary site of language processing and language learning. Interlocutors typically have some knowledge that is shared (common ground) and some that is privileged. But do speakers and listeners take into account these differences, as would be predicted by models in which speaker meaning is primary, or is there an initial egocentric stage of processing in which use of resources is minimized by not taking into account these differences (see Keysar et al., 2000 on an egocentric strategy of perspective-taking)? While the issue remains unresolved and the answer is likely to be nuanced, the VWP has become a paradigm of choice because the type of display and task can be used to create situations in which different information is available to a speaker and a listener (e.g., the listener can see some objects that the speaker cannot, see Brown-Schmidt & Heller, 2018 for a recent review). Many of these studies, as well as studies in prosody and intonation use what Brown-Schmidt and Tanenhaus termed ‘targeted language games’ (Brown-Schmidt & Tanenhaus, 2008) in which the task and workspace are structured such that trials emerge as part of either a partially scripted or even unscripted interaction.

17.4.6

Prosody and intonation

Many of the questions in research on prosody and intonation are addressed within the areas previously mentioned (effects of prosody on syntactic ambiguity resolution, prosodic effects in word recognition, etc.). It is worth noting, however, that the VWP has served as a methodological catalyst for studies for a broad swath of research on topics in laboratory phonology and experimental pragmatics, for example, processing of pitch accents and boundary tones, acoustic cues that affect information structure by signalling prominence, etc. For recent reviews of prosody and intonation, including visual world studies, see Dahan (2015).

17.5

Main research methods 17.5.1

Task

VWP experiments can be categorized into two basic types according to the form of tasks. Actionbased or active-task studies require participants to interact (e.g., to move, drag or click) with visual objects either in screen-based displays or real-world environment in response to linguistic instruc­ tions (e.g., put the blue triangle on the red one in Hanna & Tanenhaus, 2004). With an action-based task, measures of response time and correction rates can be generated and used as a secondary evaluation of comprehension. Look-and-listen or passive-task studies, in contrast, monitor participants’ eye movements when they are listening to spontaneous speech without requiring any 275

Yipu Wei and Michael K. Tanenhaus

motor actions of them (Altmann & Kamide, 1999; Knoeferle et al., 2005). As the look-and-listen type of task by itself offers no control or evaluation of participants’ performance, comprehension questions or picture verifications are often added to maintain participants’ attention through the experiment and assess comprehension. A comparison of the two types of VWP studies has been reviewed by Pyykkönen-Klauck and Crocker (2016), according to whom, the form of task might matter in the sensitivity of eye movements to certain effects (e.g., frequency, see empirical results in Weber et al., 2010; Weber & Crocker, 2012). There is a range of other variations of the VWP. A printed-word visual world paradigm (Huettig & McQueen, 2007; McQueen & Viebahn, 2007), for example, presents participants with printed words instead of visual objects. This printed-word version of VWP is found to be advantageous in examining the processing of orthographic information in speech perception (Ito, 2019; Salverda & Tanenhaus, 2010). Researchers also use another variant—the blank screen paradigm (Altmann, 2004; Altmann & Kamide, 2004), in which visual displays were presented to subjects and then removed before each auditory target sentence was played, to explore whether anticipatory eye movements in linguistic processing depend on concurrent visual scenes.

17.5.2

Data and measurement

Three components are needed to process typical eye-tracking data: (i) XY coordinates of gaze positions marked with time stamps (in the form of milliseconds or landmarks of linguistic input), often automatically generated by an eye-tracking system; (ii) a visual display image or a video recording of the naturalistic visual context; (iii) coding information of interest areas. These components can be mapped via data clean-up software such as DataViewer (SR Research), which creates interpretable indexes for analysis, such as fixation percentage on certain interest areas within a ­time-period. ​­ Dependent variables used in VWP eye-tracking research slightly vary across studies. The most common measurement is the proportion of fixations (e.g., Allopenna et al., 1998), which is the percentage of all fixations that fall on an interest area within a time window aggregated by trials and participants. A variant of fixation proportion measure, target ratio (fixations on target divided by the sum of fixations on target and competitor; Heller et al., 2008), directly compares target advantages over competitor. In some studies, the number of saccades to certain interest areas (Altmann, 2004) or saccade latencies (Salverda et al., 2014) are also used to evaluate participants’ attention towards visual objects.

17.5.3 Analysis 17.5.3.1 ­Window-based ​­ analysis Window-based analysis typically evaluates mean fixation proportions or time of saccades within a given temporal window as a function of independent variables such as interest area, condition, and group. Analysis of variance, t-tests, or linear regression analysis can be applied here. But note that proportions and time as dependent variables often violate the normal distribution assumption of statistical tests, and thus, proper data transformations are needed to obtain robust results (Jaeger, 2008). Although it is possible to compare fixation proportions or saccades across different time windows, window-based analysis falls short in describing the change of eye movements over time. 276

Analysing spoken language comprehension with eye tracking

17.5.3.2 ­Growth-curve ​­ analysis Growth-curve analysis offers an approach to examining the fine-grained time course of language processing (Mirman, 2014; Mirman et al., 2008). In a growth-curve model, time is included as an independent variable that predicts the probability distribution of fixations. Apart from the linear effect of time, quadratic and cubic effects of time can also be added to the analytical model to create a better estimate of the shape of curves representing gaze changes. One concern about growth-curve analysis is that eye movement responses as a time series are autocorrelated (Huang & Snedeker, 2020; Oleson et al., 2017)—the position a participant is looking at during one time bin is highly likely to be the location of the eye gaze in the following time bin. To reduce such autocorrelation, a generalized additive mixed model (GAMM, Baayen et al., 2018; Wood, 2006) has been introduced as an alternative approach to analysing VWP time course data (Nixon et al., 2016; Porretta et al., 2018).

17.5.3.3 ­Divergence-point ​­ analysis A comparatively less used yet informative approach to analyse VWP data is divergence-point analysis, which estimates the earliest time point two fixation proportion curves (representing looks to the target and the competitor respectively, for example) significantly diverge from each other. Stone et al. (2021) extended this type of analysis by applying a non-parametric bootstrapping approach and estimating a divergence point for each resampled dataset. This new approach provides a bootstrapped mean divergence-time for each group/condition and a confident interval that statistically quantifies the temporal uncertainty of between-group/condition differences. Divergence-point analysis can be particularly useful to compare the time course of processing by different groups (e.g., L1 and L2 language users, Stone et al., 2021) or to detect distinct processing stages (Corps et al., 2022). Finally, as we mentioned earlier, the primary data for eye-tracking studies are initiation of saccades to locations and the time spent fixating between saccades. In visual world studies, these events are contingent on a range of factors. Multiple research groups are seeking to develop eventbased analyses, typically within the framework of Markov models. While most visual world studies may not have sufficient data to use these analyses as they emerge, we encourage researchers to monitor the literature for developments.

17.6

Recommendations for practice

17.6.1

Preparing visual stimuli

A typical display of VWP studies includes a target, competitor(s), and distractor(s). To create visual stimuli that are well-controlled on familiarity and visual complexity, one could resort to standardized image databases for psycholinguistic research, such as Snodgrass and Vanderwart’s picture set (Moreno-Martínez & Montoro, 2012; Snodgrass & Vanderwart, 1980), IPNP (Szekely et al., 2004) and BOSS (Brodeur et al., 2010). The locations of visual objects need to be counterbalanced or arranged by randomization to avoid systematic patterns of fixations (Salverda & Tanenhaus, 2018). A proper distance needs to be kept among visual objects to facilitate later coding.

17.6.2

Preparing auditory stimuli

Researchers need to be aware of potential co-articulation effects in spoken language processing. That is, listeners use pre-onset acoustic cues to predict the sound of an upcoming word (Salverda 277

Yipu Wei and Michael K. Tanenhaus

et al., 2014). To eliminate such co-articulation effects, one could include an adequate silence period between words or use same neutral segmented recordings across conditions. Yet, the naturalness of the whole utterance needs to be checked with native speakers.

17.6.3

Interpretation of results

An important assumption in eye-tracking studies is that it takes about 200 ms to launch a saccade to a visual object in linguistic processing (Salverda et al., 2014). Thus, gaze patterns after 200 ms from the onset of a speech signal are typically explained as responses to this signal. A common form of data visualization is to plot the proportion of fixations (looks) (first introduced by Allopenna et al., 1998) to different objects in a display as a function of time from the onset of an utterance or particular words in the utterance. For example, in the sentence The boy will look at the candle, important landmarks might be the onset of the subject noun boy, the verb look, and the object noun (the target) candle. Proportion of fixation plots is an important tool for understanding how the pattern of fixations changes over time. They are particularly useful for spotting potential problems, for example, when there are unexpected baseline differences among conditions.

17.7

Future directions

Three lines of research are representative of the future directions for visual world eye tracking. First, co-registering brain imaging signals (e.g., from EEG and fMRI) with eye movements brings the advantage of having both high-temporal resolution and neural sensitivity in language-mediated response data (Henderson et al., 2013) and has yielded fruitful outcome in the cognitive processes of real-time language processing (e.g., Bonhage et al., 2015; Hollenstein et al., 2018). Second, visual world eye tracking in three-dimensional virtual reality (VR) is promising in terms of creating a naturalistic environment while still maintaining precise control over experimental settings. By immersing participants in virtual reality, Eichert et al. (2018) replicated anticipatory eye movements as a result of predictive language processing, and joint attentional space was found to be critical to such predictive processing in VR (Heyselaar et al., 2020). Third, with the growing need of collecting data remotely to reach a bigger population, virtual eye-tracking experiment via webcam has become a recent trend (Semmelmann & Weigelt, 2018). With adequate control and design, data collected online has reliably replicated classic findings in on-site VWP experiments (Ovans et al., 2021).

Further reading Henderson, J. M., & Ferreira, F. (2004). The Interface of Language, Vision, and Action: Eye Movements and the Visual World. Psychology Press. Huettig, F., Rommers, J., & Meyer, A. S. (2011). Using the visual world paradigm to study language processing: A review and critical evaluation. Acta Psychologica, 137(2), ­ 151–171. ­ ​­ Knoeferle, P., Pyykkönen-Klauck, P., & Crocker, M. W. (2016). Visually Situated Language Comprehension. Amsterdam: John Benjamins Publishing. Salverda, A. P., & Tanenhaus, M. K. (2018). The visual world paradigm. In M. B. de Groot & P. Hagoort (Eds.), Research Methods in Psycholinguistics and the Neurobiology of Language: A Practical Guide ­ (pp. ­­  ­89–110). ​­ ­Wiley-Blackwell. ​­

Related topics Contrasting online and offline measures in experimental linguistics; analysing reading with eye-tracking; ​­ ​­ analysing ­non-verbal interactions with ­eye-tracking

278

Analysing spoken language comprehension with eye tracking

References Allopenna, P. D., Magnuson, J. S., & Tanenhaus, M. K. (1998). Tracking the time course of spoken word recognition using eye movements: Evidence for continuous mapping models. Journal of Memory and Language, 22(1), ­ ­1–12. ​­ doi: 10.1016/j.cub.2009.12.014. ­ Altmann, G. T. M. (2004). Language-mediated eye movements in the absence of a visual world: The “blank ­ ­79–87. ​­ ­ screen paradigm”. Cognition, 93(2), doi: 10.1016/j.cognition.2004.02.005. Altmann, G. T. M., & Kamide, Y. (1999). Incremental interpretation at verbs: Restricting the domain of sub­ ­247–264. ​­ ­­ ​­ ­ ­­ ​­ sequent reference. Cognition, 73(3), doi: 10.1016/S0010–0277(99)00059-1. Altmann, G. T. M., & Kamide, Y. (2004). Now you see it, now you don’t: Mediating the mapping between language and the visual world. In, J. M. Henderson & F. Ferreira (Eds.), The Interface of Lan­­  ­347–386). ​­ guage, Vision, and Action: Eye Movements and the Visual World (pp. Psychology Press. doi: ­ 10.4324/9780203488430. Altmann, G. T. M., & Kamide, Y. (2007). The real-time mediation of visual attention by language and world knowledge: Linking anticipatory (and other) eye movements to linguistic processing. Journal of Memory ­ ­502–518. ​­ ­ and Language, 57(4), doi: 10.1016/j.jml.2006.12.004. Altmann, G. T. M., & Mirković, J. (2009). Incrementality and prediction in human sentence processing. Cog­ doi: 10.1111/j.1551–6709.2009.01022.x.Incrementality. ­ ­ ​­ nitive Science, 33(4). Arnold, J. E., Eisenband, J. G., Brown-Schmidt, S., & Trueswell, J. C. (2000). The rapid use of gender infor­ ­B13–B26. ​­ mation: Evidence of the time course of pronoun resolution from eyetracking. Cognition, 76(1), ­­ ​­ ­ ­­ ​­ doi: 10.1016/S0010–0277(00)00073-1. Baayen, R. H., van Rij, J., De Cat, C., & Wood, S. (2018). Autocorrelated errors in experimental data in the language sciences: Some solutions offered by generalized additive mixed models. In D. Speelman, K. Heylen, & D. Geeraerts (Eds.), Mixed-Effects Regression Models in Linguistics, Quantitative Methods in the Humanities and Social Sciences (pp. ­­  ­49–69). ​­ Cham: Springer. doi: 10.1007/978-3-319–69830-4_4. ­­ ­​­­ ­​­­ ­​­­ ​­ Bonhage, C. E., Mueller, J. L., Friederici, A. D., & Fiebach, C. J. (2015). Combined eye tracking and fMRI reveals neural basis of linguistic predictions during sentence comprehension. Cortex, 68, 33–47. ­ ​­ doi: 10.1016/j.cortex.2015.04.011. ­ Brodeur, M. B., Dionne-Dostie, E., Montreuil, T., & Lepage, M. (2010). The bank of standardized stimuli (BOSS), a new set of 480 normative photos of objects to be used as visual stimuli in cognitive research. PLoS ONE, 5(5). ­ doi: 10.1371/journal.pone.0010773. ­ Brown-Schmidt, S., Byron, D. K., & Tanenhaus, M. K. (2005). Beyond salience: Interpretation of personal and ­ 292–313. ­ ​­ ­ demonstrative pronouns. Journal of Memory and Language, 53(2), doi: 10.1016/j.jml.2005.03.003. Brown-Schmidt, S., & Heller, D. (2018). Perspective-taking during conversation. The Oxford Handbook of Psycholinguistics, ­548–572. ​­ doi: 10.1093/oxfordhb/9780198786825.013.23. ­ ­ Brown-Schmidt, S., & Tanenhaus, M. K. (2008). Real-time investigation of referential domains in un­ ­ ​­ scripted conversation: A targeted language game approach. Cognitive science, 32(4), 643–684. doi: ­ ­ ​­ 10.1080/03640210802066816.Real-time. Chow, W. Y., & Chen, D. (2020). Predicting (in)correctly: Listeners rapidly use unexpected information to revise their predictions. Language, Cognition and Neuroscience, 35(9), ­ ­1149–1161. ​­ doi: 10.1080/23273798.2020.1733627. ­ Clark, H. H. (1992) Arenas of Language Use. The University of Chicago Press. Cooper, R. M. (1974). The control of eye fixation by the meaning of spoken Language. Cognitive Psychology, ­ ­84–107. ​­ ­­ ​­ ­ ­­ ​­ 107(1), doi: 10.1016/0010–0285(74)90005-X. Corps, R. E., Brooke, C., & Pickering, M. J. (2022). Prediction involves two stages: Evidence from visual­world ­eye-tracking. ​­ Journal of Memory and Language, 122, 104298. Cozijn, R., Commandeur, E., Vonk, W., & Noordman, L. G. (2011). The time course of the use of implicit causality information in the processing of pronouns: A visual world paradigm study. Journal of Memory and Language, 64(4), ­ ­381–403. ​­ doi: 10.1016/j.jml.2011.01.001. ­ Dahan, D. (2015). Prosody and language comprehension. Wiley Interdisciplinary Reviews: Cognitive Science, 6(5), ­ 441–452. ­ ​­ doi: 10.1002/wcs.1355. ­ Degen, J., & Tanenhaus, M. K. (2019). Constraint-based pragmatic processing. In C. Cummins, & N. Katsos ­­  ­21–38). ​­ (Eds.), ­ The Oxford Handbook of Experimental Semantics and Pragmatics (pp. Oxford University Press. Eichert, N., Peeters, D., & Hagoort, P. (2018). Language-driven anticipatory eye movements in virtual reality. Behavior Research Methods, 50(3), ­ ­1102–1115. ​­ doi: 10.3758/s13428-017-0929-z. ­­ ­​­­ ­​­­ ​­ Engelhardt, P. E., Bailey, K. G. D., & Ferreira, F. (2006). Do speakers and listeners observe the Gricean Maxim of Quantity?. Journal of Memory and Language, 54(4), ­ ­554–573. ​­ doi: 10.1016/j.jml.2005.12.009. ­

279

Yipu Wei and Michael K. Tanenhaus Grodner, D. J., & Sedivy, J. (2011). The effect of speaker-specific information on pragmatic inferences. In E. ­­  ­239–271). ​­ A. Gibson & N. J. Pearlmutter (Eds.), The Processing and Acquisition of Reference (pp. The MIT Press. Grüter, T., Lau, E., & Ling, W. (2020). How classifiers facilitate predictive processing in L1 and L2 Chinese: ­ 221–234. ­ ​­ The role of semantic and grammatical cues. Language, Cognition and Neuroscience, 35(2), doi: ­ 10.1080/23273798.2019.1648840. Hanna, J. E., & Tanenhaus, M. K. (2004). Pragmatic effects on reference resolution in a collaborative task: ­ 105–115. ­ ​­ ­ Evidence from eye movements. Cognitive Science, 28(1), doi: 10.1016/j.cogsci.2003.10.002. ­ Hayhoe, M., & Ballard, D. (2005). Eye movements in natural behavior. Trends in Cognitive Sciences, 9(4), ­188–194. ​­ ­ doi: 10.1016/j.tics.2005.02.009. Heller, D., Grodner, D., & Tanenhaus, M. K. (2008). The role of perspective in identifying domains of refer­ ­831–836. ​­ ­ ence. Cognition, 108(3), doi: 10.1016/j.cognition.2008.04.008. Henderson, J. M., Luke, S. G., Schmidt, J., & Richards, J. E. (2013). Co-registration of eye movements and ­event-related ​­ ​­ ­ ​­ potentials in ­connected-text paragraph reading. Frontiers in Systems Neuroscience, 7, 1–13. ­ doi: 10.3389/fnsys.2013.00028. Heyselaar, E., Peeters, D., & Hagoort, P. (2020). Do we predict upcoming speech content in naturalistic environ­ ­440–461. ​­ ­ ments?. Language, Cognition and Neuroscience, 36(4), doi: 10.1080/23273798.2020.1859568. Hollenstein, N., Rotsztejn, J., Troendle, M., Pedroni, A., Zhang, C., & Langer, N. (2018). Data descriptor: ZuCo, a simultaneous EEG and eye-tracking resource for natural sentence reading. Scientific Data, 5, ­1–13. ​­ ­ doi: 10.1038/sdata.2018.291. Huang, Y., & Snedeker, J. (2020). Evidence from the visual world paradigm raises questions about unaccusativity and growth curve analyses. Cognition, 200, 104251. doi: 10.1016/j.cognition.2020.104251. Huang, Y. T., & Snedeker, J. (2009). Online interpretation of scalar quantifiers: Insight into the semantics­pragmatics interface. Cognitive Psychology, 58(3), ­ ­376–415. ​­ ­ doi: 10.1016/j.cogpsych.2008.09.001. Huettig, F., & McQueen, J. M. (2007). The tug of war between phonological, semantic and shape information ​­ ­ ­460–482. ​­ ­ in ­language-mediated visual search. Journal of Memory and Language, 57(4), doi: 10.1016/j. jml.2007.02.001. Ito, A. (2019). Prediction of orthographic information during listening comprehension: A printed-word ­ ­ ​­ visual world study. Quarterly Journal of Experimental Psychology, 72(11), 2584–2596. doi: 10.1177/ ­1747021819851394. Jaeger, T. F. (2008). Categorical data analysis: Away from ANOVAs (transformation or not) and towards logit ­ ­434–446. ​­ ­ mixed models. Journal of Memory and Language, 59(4), doi: 10.1016/j.jml.2007.11.007. ­ Just, M. A., & Carpenter, P. A. (1976). Eye fixations and cognitive processes. Cognitive Psychology, 8(4), ­441–480. ​­ ­­ ​­ ­ ­­ ​­ doi: 10.1016/0010–0285(76)90015-3. Kaiser, E., Runner, J. T., Sussman, R. S., & Tanenhaus, M. K. (2009). Structural and semantic constraints on ­ ­55–80. ​­ doi: 10.1016/j.cognition.2009.03.010. ­ the resolution of pronouns and reflexives. Cognition, 112(1), Kaiser, E., & Trueswell, J. C. (2008). Interpreting pronouns and demonstratives in Finnish : Evidence for a ­ 709–748. ­ ​­ form-specific approach to reference resolution. Language and Cognitive Processes, 23(5), doi: ­ 10.1080/01690960701771220. Kamide, Y., Altmann, G. T. M., & Haywood, S. L. (2003). The time-course of prediction in incremental sen­ tence processing: Evidence from anticipatory eye movements. Journal of Memory and Language, 49(1), ­133–156. ​­ ­­ ​­ ­ ­­ ​­ doi: 10.1016/S0749-596X(03)00023-8. Keysar, B., Barr, D. J., Balin, J. A., & Brauner, J. S. (2000). Taking perspective in conversation: The role of mu­ ­32–38. ​­ ­­ ​­ tual knowledge in comprehension. Psychological Science, 11(1), doi: 10.1111/1467–9280.00211. Knoeferle, P., Crocker, M. W., Scheepers, C., & Pickering, M. J. (2005). The influence of the immediate visual context on incremental thematic role-assignment: Evidence from eye-movements in depicted events. Cog­ ­95–127. ​­ ­ nition, 95(1), doi: 10.1016/j.cognition.2004.03.002. Knoeferle, P., & Crocker, M. W. (2006). The coordinated interplay of scene, utterance, and world knowledge: ­ 481–529. ­ ​­ ­ Evidence from eye tracking. Cognitive Science, 30(3), doi: 10.1207/s15516709cog0000_65. Kundel, H. L., & Nodine, C. F. (1973). A computer system for processing eye-movement records. Behavior ­ ­147–152. ​­ Research Methods & Instrumentation, 5(2), Loftus, G., Mathews, P., Bell, S., & Poltrock, S. (1975). General software for an on-line eye-movement record­ 201–204. ­ ​­ ­ ing system. Behavior Research Methods & Instrumentation, 7(2), doi: 10.3758/BF03201326. Mackworth, N. H. (1968). The wide-angle reflection eye camera for visual choice and pupil size. Percep­ ­32–34. ​­ ­ tion & Psychophysics, 3(1), doi: 10.3758/BF03212708.

280

Analysing spoken language comprehension with eye tracking Mackworth, N. H., & Morandi, A. J. (1967). The gaze selects information details within picutres. Perception & Psychophysics, 2(11), ­ ­547–552. ​­ Magnuson, J. S., Dixon, J. A., Tanenhaus, M. K., & Aslin, R. N. (2007). The dynamics of lexical competition during spoken word recognition. Cognitive Science, 31(1), doi: 10.1080/03640210709336987. ­ ­133–156. ​­ ­ McConkie, G. W. (1976). The use of eye-movement data in determining the perceptual span in reading. In R. A. Monty & J. W. Senders (Eds.), Eye Movements and Psychological Processes. Erlbaum. McConkie, G. W., & Rayner, K. (1973). An on-line computer technique for studying reading: Identifying the perceptual span. In P. O. Nacke (Ed.), Diversity in Mature Reading: Theory and Research. Twenty-Second Yearbook of the National Reading Conference. National Reading Conference. McConkie, G. W., & Rayner, K. (1975). The span of the effective stimulus during a fixation in reading. Perception & Psychophysics, 17(6), doi: 10.3758/BF03203972. ­ ­578–586. ​­ ­ McQueen, J. M., & Viebahn, M. C. (2007). Tracking recognition of spoken words by tracking looks to printed words. Quarterly Journal of Experimental Psychology, 60(5), doi: 10.1080/17470210601183890. ­ ­661–671. ​­ ­ Mirman, D. (2014). Growth Curve Analysis and Visualization Using R. CRC Press. Mirman, D., Dixon, J. A., & Magnuson, J. S. (2008). Statistical and computational models of the visual world paradigm: Growth curves and individual differences. Journal of Memory and Language, 59(4), ­ ­475–494. ​­ doi: 10.1016/j.jml.2007.11.006. ­ Moreno-Martínez, F. J., & Montoro, P. R. (2012). An ecological alternative to Snodgrass & Vanderwart: 360 high quality colour images with norms for seven psycholinguistic variables. PLoS ONE, 7(5), ­ ­34–42. ​­ doi: 10.1371/journal.pone.0037527. ­ Nixon, J. S., van Rij, J., Mok, P., Baayen, R. H., & Chen, Y. (2016). The temporal dynamics of perceptual uncertainty: Eye movement evidence from Cantonese segment and tone perception. Journal of Memory and Language, 90, ­103–125. doi: 10.1016/j.jml.2016.03.005. ​­ ­ Oleson, J. J., Cavanaugh, J. E., McMurray, B., & Brown, G. (2017). Detecting time-specific differences between temporal nonlinear curves: Analyzing data from the visual world paradigm. Statistical Methods in Medical Research, 26(6), doi: 10.1177/0962280215607411. ­ ­2708–2725. ​­ ­ Ovans, Z., Huang, Y. T., & Novick, J. (2021). Virtual-World eye-tracking: Replicating sentence processing effects remotely. In 34th Annual CUNY Conference on Human Sentence Processing. Philadelphia, PA. Porretta, V., Kyröläinen, A. J., Rij, J. V., & Järvikivi, J. (2018). Visual world paradigm data: From preprocessing to nonlinear time-course analysis. In I. Czarnowski, R. Howlett & L. Jain (Eds.), Intelligent Decision Technologies 2017. Smart Innovation, Systems and Technologies (pp. ­­  ­268–277). ​­ Springer. doi: 10.1007/978-3-319–59424-8_25. ­­ ­​­­ ­​­­ ­​­­ ​­ Pyykkönen-Klauck, P., & Crocker, M. W. (2016). Attention and eye movement metrics in visual world eye tracking. In P. Knoeferle, P.Pyykkönen-Klauck & M. W. Crocker (Eds.), Visually Situated Language Comprehension. John Benjamins Publishing. Pyykkönen, P., & Järvikivi, J. (2010). Activation and persistence of implicit causality information in spoken ­ ­5–16. ​­ ­­ ​­ ­ language comprehension. Experimental Psychology, 57(1), doi: 10.1027/1618–3169/a000002. Rayner, K. (1998). Eye movements in reading and information processing: 20 years of research. Psychologi­ ­372–422. ​­ ­­ ​­ cal Bulletin, 124(3), doi: 10.1037/0033–2909.124.3.372. Rueschemeyer, S.-A., & Gaskell, M. G. (2018). The Oxford Handbook of Psycholinguistics (2nd ed.). Oxford University Press. Salverda, A. P., Brown, M., & Tanenhaus, M. K. (2011). A goal-based perspective on eye movements in visual ­ ­172–180. ​­ ­ world studies. Acta Psychologica, 137(2), doi: 10.1016/j.actpsy.2010.09.010. Salverda, A. P., Kleinschmidt, D., & Tanenhaus, M. K. (2014). Immediate effects of anticipatory coarticulation in ­spoken-word ​­ recognition. Journal of Memory and Language, 71(1), ­ ­145–163. ​­ doi: 10.1016/j. ­ jml.2013.11.002. Salverda, A. P., & Tanenhaus, M. K. (2010). Tracking the time course of orthographic information in spoken­word recognition. Journal of Experimental Psychology: Learning, Memory, and Cognition, 36(5), ­ ­1108–​ ­1117. doi: 10.1037/a0019901. ­ Salverda, A. P., & Tanenhaus, M. K. (2018). The visual world paradigm. In M. B. de Groot & P. Hagoort ­ (Eds.), Research Methods in Psycholinguistics and the Neurobiology of Language: A Practical Guide (pp. ­­  ­89–110). ​­ ­Wiley-Blackwell. ​­ Sedivy, J. C., Tanenhaus, M. K., Chambers, C. G., & Carlson, G. N. (1999). Achieving incremental semantic interpretation through contextual representation. Cognition, 71(2), ­ ­109–147. ​­ doi: 10.1016/S0010– ­­ ​­0277(99)00025-6. ­ ­­ ​­

281

Yipu Wei and Michael K. Tanenhaus Semmelmann, K., & Weigelt, S. (2018). Online webcam-based eye tracking in cognitive science: A first look. ­ ­451–465. ​­ ­­ ­​­­ ­​­­ ​­ Behavior Research Methods, 50(2), doi: 10.3758/s13428-017-0913–7. Snodgrass, J. G., & Vanderwart, M. (1980). A standardized set of 260 pictures: norms for name agreement, image agreement, familiarity, and visual complexity. Journal of Experimental Psychology: Human Learn­ 174–215. ­ ​­ ­ ing and Memory, 6(2), doi: 10.1109/icip.2001.958943. Spivey, M. J., & Huette, S. (2016). Toward a situated view of language. In P. Knoeferle, P. PyykkönenKlauck & M. W. Crocker (Eds.), Visually Situated Language Comprehension. John Benjamins Publishing. Stone, K., Lago, S., & Schad, D. J. (2021). Divergence point analyses of visual world data: Applications to bilin­ ­833–841. ​­ ­ gual research. Bilingualism: Language and Cognition, 24(5), doi: 10.1017/s1366728920000607. Szekely, A., Jacobsen, T., D’Amico, S., Devescovi, A., Andonova, E., Herron, D.,… & Bates, E. (2004). A ­ 247–250. ­ ​­ new on-line resource for psycholinguistic studies. Journal of Memory and Language, 51(2), Tanenhaus, M. K., Spivey-Knowlton, M. J., Eberhard, K. M., & Sedivy, J. C. (1995). Integration of visual and ­ ­ ​­ linguistic information in spoken language comprehension. Science, 268(5217), 1632–1634. Tanenhaus, M. K., & Brown-Schmidt, S. (2008). Language processing in the natural world. Philosophical Trans­ ­1105–1122. ​­ ­ actions of the Royal Society B: Biological Sciences, 363(1493), doi: 10.1098/rstb.2007.2162. Trueswell, J. C., Sekerina, I., Hill, N. M., & Logrip, M. L. (1999). The kindergarten-path effect: Studying on-line ­ ­89–134. ​­ ­­ ​­ ­ ­­ ​­ sentence processing in young children. Cognition, 73(2), doi: 10.1016/S0010–0277(99)00032-3. van Veen, R. (2011). Children’s comprehension of subjective and objective causality: Evidence from eye­tracking. In The Acquisition of Causal Connectives: The Role of Parental Input and Cognitive Complexity ­­  ­139–162), ​­ ­ (pp. (Doctoral Dissertation, Utrecht University). Weber, A., & Crocker, M. W. (2012). On the nature of semantic constraints on lexical access. Journal of Psy­ 195–214. ­ ​­ ­­ ­​­­ ­​­­ ​­ cholinguistic Research, 41(3), doi: 10.1007/s10936-011-9184-0. Weber, A., Crocker, M. W., & Knoeferle, P. (2010). Conflicting constraints in resource-adaptive language ­ ​­ comprehension. In M. W. Crocker & J. Siekmann (Eds.), Resource-Adaptive Cognitive Processes ­­  ­119–141). ​­ ­Springer-Verlag. ​­ ­­ ­​­­ ­​­­ ­​­­ ​­ (pp. doi: 10.1007/978-3-540–89408-7_7. Wood, S. N. (2006). Generalized Additive Models: An Introduction with R. Chapman and Hall/CRC. Yarbus, A. L. (1967). Eye Movements and Vision. Plenum Press. Young, L. R., & Sheena, D. (1975). Eye-movement measurement techniques. The American Psychologist, ­ ­315–330. ​­ ­­ ​­ 30(3), doi: 10.1037/0003–066X.30.3.315.

282

18 MOBILE EYE-TRACKING ­ ​­ FOR MULTIMODAL INTERACTION ANALYSIS Geert Brône and Bert Oben

18.1

Introduction

Imagine the following everyday communicative situation with three friends casually walking across town, when one of them inquires about lunch: (1) ­ 1. A: (shifts his gaze to B) hey, are you hungry? 2. B: (shifts gaze from path in front of him to A) yeah, I’m starving man. (shifts gaze from A to C) how about you? 3. C: (looks at B and then shifts to A): yeah, I could do with a bite. 4. A: (shifts gaze from C to a restaurant to their left and points towards the restaurant; then shifts gaze back to B) how about this place? 5. B: (nods while looking at A and then at C) looks good. 6. C: (looking at A) let’s do it. This simple piece of constructed dialogue illustrates that eye gaze serves multiple functions in face-to-face interaction, ranging from perception and processing of interactionally relevant information to active communicative signalling. In the example in (1), we have participants actively seeking the other’s attention and mobilizing a response through gaze (A’s utterances in lines 1 and 4, B in line 2), spatial referencing (A looking and pointing at a restaurant in line 4) and signalling of agreement (C in lines 3 and 6, B in line 5), but also basic gaze orientation that simultaneously reflects processing and the signalling of attention or availability (C’s gaze directed at B in line 3 helps in processing B’s question, while at the same time signalling availability to respond to the question). At the same time, the example illustrates the tight (temporal) relation between eye gaze and other verbal and nonverbal resources, like the use of the sequence-opening device hey in line 1, the pointing gesture in line 4 and the nod in line 5. Even if these examples may appear simple and largely routinized, they require a good deal of coordination between the participants involved. Analysing the role of eye gaze in this coordinated action (put forward, among others, in the language as joint action hypothesis by Herbert Clark, 1996), is a challenge that requires appropriate methodological and technological tools. 283

DOI: 10.4324/9781003392972-21

Geert Brône and Bert Oben

Ever since the second half of the 20th century, when social psychologists first showed an interest in the role of nonverbal behaviour as part of human face-to-face communication (see e.g., Argyle, 1975), studies have focused on the role of eye gaze in interaction, alongside other semiotic resources such as facial expressions, gesture and posture. Early pioneering studies, such as Kendon (1967), Duncan (1975) and Argyle and Cook (1976), provided empirical evidence for the pivotal role of eye gaze in communication, both from an evolutionary (e.g., the differential role of eye contact in humans versus other species), intercultural (e.g., different cultural norms for gaze aversion and mutual gaze) and more generally discourse-organizational perspective (e.g., gaze as an instrument in conversation management). Given the multifunctionality of eye gaze in establishing successful communication, it is not surprising that a variety of disciplines have developed research lines on this topic, ranging from conversation analysis and interactional linguistics, over social and experimental psychology to studies on human-computer and robot interaction. One of the key challenges that all these disciplines face(d) is the design of appropriate tools and methods to study the ‘gaze machinery in interaction’, considering the coordination of gaze and other resources in the production and processing of language, the coordination of gaze across participants engaged in interaction and temporal sensitivity in relation to the various functions of eye gaze. The present chapter zooms in on one methodological paradigm that allows researchers to gather fine-grained information on participants’ gaze behaviour while they are engaged in (face-to-face) interaction. This paradigm involves the use of mobile eye-tracking systems, which typically take the shape of a pair of glasses with an integrated scene camera and eye-tracking cameras (see Figure 18.3). While the scene camera captures approximately the visual field of the user, comparable to a head-mounted action camera like GoPro, the output of the infrared eye-tracking cameras is represented as a moving dot (the so-called gaze cursor) on the scene camera images. This way, the resulting video output affords an inside(r) perspective which, when generated for all participants engaged in an interaction, creates a multifocal ­eye-tracking paradigm (Brône & Oben, 2015), ​­ providing fine-grained gaze information of all participants at each point in time. The focus of this chapter is on how this paradigm emerged and how it is implemented in different theoretical and methodological frameworks, to answer some of the key questions concerning the role of eye gaze in interaction. The chapter is structured as follows. In the next section, we present a sketch of the main research strands and traditions that have studied the role of eye gaze in (face-to-face) interaction, both from a socio-interactional and cognitive perspective. In Section 18.3, we zoom in on the empirical methods that are used to get a grasp on the complexity of the phenomenon. In Section 18.4, then, we present an overview of recent studies that have employed wearable eye-tracking devices for the study of eye gaze in interaction. We identify several clusters of studies based on the interactional setting as well as the methodological approach. Based on this overview, we formulate some recommendations for scholars interested in adopting an eye-tracking paradigm (Section 18.5) as well as some reflections on directions for future research (Section 18.6).

18.2

Eye gaze in interaction: Research strands and traditions

Before we provide an overview of recent studies that have made use of mobile eye-tracking systems, let us first return to the basic observation addressed in the introduction, viz. that eye gaze is a multifunctional communicative resource employed by both speakers and their addressees, and that different disciplines have shown a steady interest in its role in interaction. In the significant body

284

Mobile eye-tracking for multimodal interaction analysis

of research published in the past 50 years, two major strands can be identified, partly reflecting the disciplines involved (see Brône & Oben, 2018 for a more elaborate overview). On the one hand, the above-mentioned early work in social psychology paved the way for studies that focus on eye gaze as a semiotic resource that participants use to display engagement, affiliation and participation but also to manage the flow of the interaction. This line of research has most prominently been developed in conversation analysis and interactional linguistics but has recently also been addressed in natural user-interface design in human-computer interaction. On the other hand, there is a cluster of studies with a stronger cognitive orientation that aim at uncovering the role of eye gaze in language production and processing, including the management of joint attention, referent tracking (including deixis) and spatial organization. These studies are largely rooted in a tradition of cognitive psychology and to a lesser extent in pragmatics. Although not all studies can be situated in either of these lines of research, and many draw on aspects of both, we treat them separately here for reasons of presentation. The interactionally oriented research strand has provided evidence for the constitutive role of gaze orientation in relation to social actions, including the distribution of participation roles in interaction (speakers, hearers, bystanders, etc.) and the management of a smooth exchange of turns (see Rossano, 2012a and Degutyte & Astell, 2021 for detailed overviews). Starting from the early studies mentioned above, it has been shown repeatedly that the relationship between participants engaged in (face-to-face) interaction is reflected in their gaze behaviour. This relationship, also referred to as the participation framework (Goffman, 1981), manifests itself in the fact that speakers, their addressees and non-addressed participants display different gaze patterns. Kendon (1967) already observed in two-party interactions, using a video recording set-up with a camera and mirror to capture both participants’ gaze orientation simultaneously, that speakers tend to shift their gaze more often between their co-participant and the background than do hearers, who display more sustained periods of gaze towards the current speaker. This observation has been replicated repeatedly over the past decades in different interactional constellations and using different recording equipment (see e.g., Brône et al., 2017; Goodwin, 1981; Hirvenkari et al., 2013; Ho et al., 2015; Vertegaal, 1999). Apart from speakers and their primary addressees, studying the gaze behaviour of non-addressed participants in multiparty interaction may also yield interesting insights, for example, concerning their ability to anticipate turn shifts between primary participants. Studies on both spoken language (Holler & Kendrick, 2015) and sign language interaction (Beukeleers et al., 2020) have shown that in questionanswer sequences, unaddressed participants tend to shift their gaze from the current speaker/ signer to the expected next speaker/signer before turn completion, reflecting a cognitive process of projection. Important for all these functions, however, is that they do not function in a social vacuum and are at least partly tied to the ongoing social activity the participants are involved in (Rossano, 2012b). In other words, the above-mentioned patterns may differ, depending on whether the participants are engaged in a storytelling activity, a question-answer sequence, an instructional task, etc. Apart from its relationship with participation roles, gaze orientation has also been shown to be instrumental in the sequential organization of the interaction. For instance, speakers may use gaze orientation as a resource in addressee and next-speaker selection: while averting one’s gaze when speaking may signal that one wants to keep the floor, looking towards a co-participant – typically towards the end of a turn-constructional unit – is a powerful instrument in selecting the next speaker (Auer, 2018, 2021; Duncan, 1975; Jehoul et al., 2017; Rossano, 2010; Streeck, 2014). But not only in the distribution of turns (i.e., the system of floor apportionment) but also

285

Geert Brône and Bert Oben

in the monitoring of the flow of the interaction, particular gaze patterns have been uncovered. For instance, speakers typically use gaze as a resource in monitoring and eliciting responses by their addressees (Bavelas et al., 2002; Goodwin & Goodwin, 1986; Zima, 2020): by briefly looking at their addressees, speakers/signers can monitor recipiency and potentially elicit response tokens (such as head nods, acoustic minimal responses like mh). What all the studies in the interactionally oriented strand show is that the management of social interaction requires participants to be strongly geared to each other for communication to be successful. The second, more cognitively inspired line of research has explored this idea of interaction as a tightly synchronized joint project by focussing on – among others – joint attention and reference resolution in situated interaction. In studying joint attention, often-used proxies are mutual gaze and shared gaze. Whereas mutual gaze refers to eye contact between two participants, which can be linked to engagement, attention and affiliation, shared gaze holds when two or more participants are directed towards the same object of attention, allowing them to jointly navigate through the topic/task at hand (Brennan et al., 2008; Neider et al., 2010; Richardson et al., 2007, 2009). Studies using different interactional setups and different methods have shown that being able to establish a shared focus of visual attention is of particular importance in achieving communicative efficiency and success (see Stephenson et al., 2021 for an overview). In fact, eye gaze is not only relevant from the moment at which shared attention has been established but may play a constitutive role in setting it up as well. A well-known phenomenon in this respect is the gaze cueing effect, according to which the gaze behaviour of an addressee can be guided by cueing a target, for instance, by looking at it (Emery, 2000; Lachat et al., 2012). The target being cued in an interactional setting can be any referent that is relevant to the ongoing interactional project, ranging from objects in the real world, items presented on a screen, a co-participant’s body (part) or the speaker’s own hands (see McKay et al., 2021 for an overview). And in fact, the latter scenario of a speaker cueing a participant’s gaze towards his/her own gesture while speaking has been shown to have an impact on information uptake (Gullberg & Kita, 2009) as well as subsequent behaviour (Oben, 2015; Oben & Brône, 2015).

18.3

Main research methods

We now turn to the data types and methods that are used in exploring this complex phenomenon. Rather than providing an exhaustive review of the literature from this point of view, we restrict ourselves to a sketch of the main clusters that can be identified. More detailed overviews involving systematic or scoping reviews, on which we base ourselves in this chapter, can be found in Jongerius et al. (2020), Valtakari et al. (2021) and Degutyte and Astell (2021), among others. The most relevant dimensions along which studies can be distinguished are the study design, interaction types, measures for gaze registration and in those cases where eye-tracking is used, the recording configuration. The differences in study design that we observe in the literature in many ways reflect the broad distinction between the two strands of research that we presented in the previous section. On the one hand, most studies in the interactionally oriented strand resort to an observational approach, meaning that researchers observe participants’ eye gaze in interaction without manipulation or intervention (and thus without specific treatment conditions). Observational studies typically involve the analysis of one or more sequences of (video-recorded) interaction, with varying degrees of freedom on the part of the participants (in terms of the topics they may address during the interaction, the freedom to move, etc). The methods used for observational studies include quantitative, qualitative and mixed methods approaches. In their systematic 286

Mobile eye-tracking for multimodal interaction analysis

literature review of studies on the role of eye gaze in the turn-taking system, Degutyte and Astell (2021) show that of the 29 studies they covered, 17 presented quantitative data, 4 used a mixed-methods design and 8 were purely qualitative. On the other side of the spectrum, the more cognitively oriented studies are typically based on experimental designs. For instance, in their scoping review of eye contact in human interaction (which can be related to the central topic of joint attention in cognitive research), Jongerius et al. (2020) report that of the 109 studies they covered, 80 were based on experimental studies and 29 on observational data, which illustrates the predilection for controlled designs involving the manipulation of one or more relevant variables. The study design also has implications for the types of interaction that are studied. Take, for instance, the number of participants involved in the interactions being studied. The large majority of studies in cognitively oriented research involves two-party interactions, in which pairs of participants perform experimentally designed tasks in a face-to-face setting or in a mediated configuration involving them performing collaborative tasks on a computer screen while their gaze is being measured (see e.g., Barisic et al., 2013 for an overview of screen-based approaches). Adding more participants to the interactions being studied, for example, for studying group dynamics, would exponentially increase the complexity of gaze patterns under investigation and significantly reduce the level of control on the part of the experimenter. In the interactionally oriented studies, on the other hand, different group sizes and configurations are being studied, as is apparent from Degutyte and Astell’s (2021) review on turn-taking dynamics: in about half of the studies reported the analysis was based on dyadic interactions, whereas the other half involved triadic or multiparty conversations. The same review shows that in 2/3 of the studies reported, the conversations could be categorized as free-range and unrestricted, whereas in 1/3 they involved a specific task to be performed or a topic to be discussed. In the cognitive strand, the predominance of experimental work naturally entails more carefully controlled task-based interactions rather than free-flowing conversation. The different measures of gaze registration that have been used in the existing literature are addressed in various reviews and overviews as well, including Degutyte and Astell (2021), Jongerius et al. (2021a) and Brône and Oben (2018). When looking at the more than 50 years of research on eye gaze in interaction since the early pioneering studies by Kendon (1967), Argyle and Cook (1976) and others, it is apparent that the use of specific measuring techniques partially reflects technological innovations. Whereas early studies in the 1960s and 1970s tended to draw on notes and data from event recorders collected by researchers directly observing people’s gaze behaviour while engaged in interaction (referred to as ‘direct assessment’ in Jongerius et al., 2020), the focus has shifted towards more video-based research from the 1980s and 1990s onwards, allowing researchers to retrospectively create gaze estimations based on recorded data. Within the broad category of video-based analysis, technological advances have had a profound impact on the type of data collected as well, e.g., through the use of multi-camera set-ups allowing researchers to (re)view the interactional landscape from different angles, or the use of mobile camera systems (e.g., head-mounted cameras) that provide an insider perspective on the interaction. Among the most recent developments in the field is the use of unobtrusive eyetracking technology, which produces more fine-grained (and reliable) information on participants’ gaze behaviour, in comparison to video camera data (see Brône & Oben, 2015; Cognolato et al., 2018; Hessels et al., 2020 for overviews of technological advances as well as critical discussions). The concept of eye-tracking technology subsumes a variety of different systems for recording people’s gaze behaviour while engaged in a specific activity, generally using a combination 287

Geert Brône and Bert Oben

of video cameras and infrared light sources to estimate a person’s point of regard or gaze direction (see Conklin et al., 2018; Duchowski, 2007; Holmqvist et al., 2015; Tatler et al., 2014; for overviews). Not all systems, however, are particularly suited for the analysis of multimodal interaction. For instance, the most widely used type of eye-tracking system, which has its roots in reading research in psycholinguistics, involves participants performing a relatively passive (reading or scene perception) task on a screen, with their head in a fixed position (e.g., using a chin rest, see Figure 18.1). Although such systems allow for a high-resolution measurement of gaze patterns, including time-sensitive measures such as saccades and pupil dilation, they are not particularly suited for research on interaction, given the relatively obtrusive set-up. Less obtrusive systems include remote eye-trackers allowing for some degree of free (head) movement with or without a screen (Figure 18.2) and eye-tracking glasses, which provide maximal flexibility through the integration of a scene camera and eye-tracking cameras into a single wearable device (Figure 18.3). Valtakari et al. (2021) provide an overview of how different eye-tracking systems are used in interaction research, and review studies involving various set-ups, including the use of one, two or multiple eye-trackers to simultaneously measure the gaze behaviour of all participants engaged in the interaction. In the following section, we zoom in on studies that have used mobile eye-tracking devices (as in Figure 18.3) to study eye gaze in an interactional setting. With this overview we want to highlight the diversity of topics that are currently being addressed with this methodology.

Figure 18.1 Head-fixed eye-tracking system.

288

Mobile eye-tracking for multimodal interaction analysis

Figure 18.2 Remote eye-tracking system with unrestrained head.

Figure 18.3 Head-mounted eye-tracking glasses.

289

Geert Brône and Bert Oben

18.4

Current contributions on mobile eye-tracking in interaction

Conducting research with mobile eye-trackers roughly leads to two types of data. On the one hand, researchers can use the video files (with a gaze cursor or cross as an overlay) generated by the scene camera and infrared cameras of the eye-tracking glasses. On the other hand, each recording produces a large amount of tabulated data (time series with information on gaze location, blinking, head movements, etc.). Depending on the research question and the methodological framework, the same measurement technique (i.e., mobile eye-tracking) can be used in many different ways. For example, in the field of Conversation Analysis (cf. Section 18.2 above), mobile eye-tracking is deployed to capture an ongoing interaction in different contexts. In a static context, in which interlocutors are seated while they talk, a set-up with a single camera might suffice, but many interactions are not that static. People talk when teaching, driving, walking, visiting a shop or event, working at different locations, etc. Mobile eye-tracking can help to capture this movingand-talking ‘from within’ and allow for a careful and holistic scrutiny of the entire communicative context (including speakers and their relations, the physical environment, bodily behaviour by the interlocutors, etc.), which is key in conversation analysis. Some researchers in conversation analysis study gaze as one of the many semiotic resources that are recruited in face-to-face interactions. For example, Kesselheim et al. (2021) and Laner (2022) use conversation analysis to study how walking-and-talking interlocutors (resp. science centre visitors and recreational walkers) jointly discover objects and artefacts in their surroundings. The authors include gaze in their analyses, but highlight that joint attention and noticing is an endeavour that requires multiple resources. Apart from eye gaze, also the use of discourse particles and hand gestures, moving through space, touching objects, etc. are relevant in explaining joint attention. Other researchers focus more explicitly on eye gaze alone. This is done in different dynamic-interactional settings, including joint attention during a museum or market visit (Stukenbrock, 2020), and zooming in on a range of conversational phenomena such as turn-taking (Auer, 2021a, Vranjes et al., 2018, Weiss, 2018), re-enacting of past experiences or imagined conversations (Pfeiffer & Weiss, 2022), eliciting collaboration of interlocutors during word search (Auer & Zima, 2021) or the interactional role of relevant physical artefacts such as computer screens during doctor-patient interaction (Jongerius et al., 2021). That incorporating eye-tracking data into conversation analysis is gaining ground, is apparent from the integration of gaze direction in the well-established transcription and annotation systems (see e.g., the system presented ­ ­­ ​­ by Lorenza Mondada: https://www.lorenzamondada.net/multimodal-transcription). In addition, it is relevant to note that some of the research in conversation analysis is combined with other methodological frameworks and means in mixed methods approaches towards the conversational phenomena. For example, Auer (2021a, 2021b) and Auer and Zima (2021) combine conversation analysis with corpus-based quantitative results. Conversely, studies starting from an experimental or corpus linguistics approach also use methods from conversation analysis (Jongerius, 2022, Kendrick & Holler, 2017) or analyses inspired by this framework (de Vries et al., 2021, Zima at al., 2019) as part of their methodological arsenal to complement quantitative results with qualitative analyses or elaborated examples. A mixed methods approach is mainly afforded by corpus data. Unlike text and audio corpora, and increasingly also unlike video corpora, which are often large in terms of words, participants and types of contexts, corpora that involve mobile eye-tracking data are still relatively small. This is of course due to the very costly and time-consuming nature of the data acquisition and annotation. Corpora like the Insight Interaction Corpus (Brône & Oben, 2015) or the ­Eye-tracking ​­ in Multimodal Interaction Corpus (Holler & Kendrick, 2015) are often created with specific research 290

Mobile eye-tracking for multimodal interaction analysis

questions in mind or the result of even more tailor-made peculiarities of experimental design. However, as more datasets in different set-ups (dyadic, triadic, multiparty, static, walking around, etc.), different conversational tasks (brainstorming, matching games, free conversation, interaction with a confederate, etc.), and different languages are developed, corpus-linguistic approaches are slowly gaining ground. For example, De Vries et al. (2021) and Brône and Oben (2021) use their corpus of spontaneous interaction to study the role of eye gaze in the production and reception of verbal irony. Oben (2018) uses the same dataset for research on how verbal and nonverbal copying behaviour can be predicted from the gaze behaviour of the interlocutors involved. Zima (2020) and Auer and Zima (2021) demonstrate how eye gaze can be a useful resource in eliciting interactional collaboration during word search or in eliciting listener feedback. In a different context of classroom interaction, Haataja et al. (2021) and Muhonen et al. (2020) show how both students’ and teachers’ visual attention can be linked to the talkativeness of students, the approachability of the teachers or the quality of the classroom interaction. Together, these examples mark the increasing possibilities of using mobile eye-tracking data as corpus-linguistic resources. Eye-tracking as a method originated in lab-bound experimental work. Even though mobile eye-trackers allow research outside of the lab, it can nonetheless be used in more controlled circumstances as well. For example, Rogers et al. (2018) show how individual participants in dyadic interactions are quite consistent in fixating either the mouth, the eyes or switching between eyes and mouth, when looking at their conversational partner. Also, the fixation durations towards the face of the interlocutor seem to be indicative of an individual style of looking at faces. In a context of dialogue interpreting, Tiselius and Sneed (2020) observe how interpreters avert their gaze more from their dialogue partners when translating into their second compared to their first language, and how gaze patterns of more experienced interpreters do not differ from the patterns in novices. During musical interactions, musicians appear to establish more eye contact when the musical piece played is more well-known and when the tempo of the pieces is more difficult and technical (Bischop et al., 2021; Vandemoortele et al., 2018). In a recent paper on eye gaze and cultural differences, Haensel et al. (2022) debunk a popular proposition that East Asian participants display more gaze avoidance than Western Caucasian participants. Their experiment shows more evidence for task-related differences (e.g., storytelling versus spontaneous interaction) in gaze avoidance and mutual gaze than cultural differences. What these studies show is that creating controlled conditions and building experiments can also be done using mobile eye-tracking. Of course, the granularity of the areas of interest (i.e., the size of the objects on which participants are focussing, typically: presence or absence of fixations on the face, hand gestures or torso of the conversational partners) is much smaller compared to remote eye-tracking systems with participants looking at a computer screen. Based on the growing evidence on eye gaze behaviour, the robustness of eye gaze patterns can sometimes even prove to be useful as a diagnostic or validating tool. Informed by carefully controlled experimental work, Gehrer et al. (2020) use gaze patterns from naturally occurring interactions (i.e., fixations on lower parts of the face and more gaze avoidance) to reliably separate neurotypical participants from incarcerated psychopathic offenders. Allen et al. (2020) use mobile eye-tracking data to predict student anxiety during a public speech, and Vabalas and Freeth (2016) demonstrate that, compared to neurotypical participants, specific gaze patterns (i.e., less visual explorations with shorter and less frequent saccades) during face-to-face interaction can be linked to participants that score high on autistic traits. Both in terms of scientific (sub)fields as well as the methods used, the overview above showcases the diversity of research in which mobile eye-tracking for interaction analysis is gaining ground. Further technological advances and more cost-effective methods (see Section 18.6 on 291

Geert Brône and Bert Oben

future directions), increasing know-how of the limitations and possibilities of mobile eye-tracking, and the availability of less expensive hardware, will undoubtedly push this relatively new field further.

18.5

Recommendations for practice

Designing a study with a novel data gathering or analysis method can be a challenging task involving a steep learning curve. This also holds for mobile eye-tracking, but when considering the possibilities and limitations of the method, starting from scratch with gaze-based research using mobile eye-tracking definitely is possible. A first consideration is the financial investment in both the eye-tracking hardware and software. High-end commercial solutions (e.g., Tobii) require substantial funds but (and because) they include the eye-tracking hardware itself, software and hardware to make recordings, licences for the software to analyse the recorded data, corrective lenses for participants with vision impairments, cases to carry and travel with the equipment, insurance, helpdesk support, training, etc. Especially when more than one eye-tracking system is needed, the costs can mount up rapidly. Pooling resources might alleviate this issue: some research groups and institutes share or rent eye-tracking equipment. Especially when the research questions are still under development or during a pre-testing phase, renting might be an option to consider. Apart from the high-end commercial systems, also more academically oriented or open-source oriented systems (e.g., Pupil Labs) are available. These systems are significantly less expensive, but often require more technical experience or data tweaking by the researcher. Because in many cases, the eye-tracking devices are highly under-used (i.e., often used only a few days/weeks a year during the actual data-gathering phase of projects), looking for pooled resources with other researchers is probably worth the effort. When studying eye gaze in interaction, often multiple eye-tracking devices are required. Apart from the financial investment, the time investment should also be taken into account. Depending on the exact research question and the requirements of local ethics committees, obtaining ethical approval might require some extra attention. Because the scene cameras can capture anything (and especially anyone) in the rough visual field of the participant, and because capturing images of participants’ eyes can be regarded as biometric data, ethical approval and privacy regulation procedures can become more labour-intensive than, for example, participant observation with video cameras. Second, for some research questions, synchronizing the time series data generated by the eyetracking devices is key. Synchronization can be done in post-processing, for example, by using a clapper or flashing light to manually synchronize the data, or by using a semi-automatic approach in which the speech signals in the audio files are used. Because eye-tracking recordings are often complemented with audio recordings (with a better sound quality than the built-in eye-tracking microphones), video recordings that capture the interaction from an external perspective with a wider angle or that zoom in on the faces of the interlocutors, or other biometrical devices (see Section 18.6 on future directions), synchronization in post-processing can easily become a tedious and time-consuming task. Therefore, synchronizing during the recording phase (with time stamps emitted to all devices involved or recording all devices onto a single computer) can save a lot of time. Unfortunately, this type of synchronization requires extra hardware and/or software, more technical skills or even tailor-made solutions. A final labour-intensive step in the eye-tracking methodology is the annotation of the recorded data. The raw output of eye-tracking recordings are x-y coordinates of the estimated gaze position at a given sample rate. From this stream of data, visual attention to relevant units of analysis must 292

Mobile eye-tracking for multimodal interaction analysis

be extracted. For example, to measure the visual attention of all participants to each other’s face, every fixation of every participant towards any other participants has to be meticulously mapped. Even though computer vision provides some solutions (see Section 18.6 on future directions), researchers should carefully plan the number of man-hours required for this part of the research. Some final considerations in assessing whether mobile eye-tracking is a feasible method to tackle one’s research questions concern the possible interpretation of the eye-tracking data. At the risk of labouring an obvious point, we want to highlight that eye-tracking only allows researchers to assess the amount of time a participant devoted foveal attention to an object or subject. This does not simply overlap with whether or not a participant has seen the object/subject. For example, the observation that a participant did not focus on a given object does not mean the participant hasn’t ‘seen’ it. Peripheral vision (i.e., perception without foveal attention) can enable people to observe objects without directly focussing on them. Depending on the exact recording set-up, the role of peripheral vision should be considered in interpreting the results. A more complex issue than peripheral vision is the gap between knowing that a participant is looking at an object/subject and knowing why the participant is doing so. There is an abundance of reasons for increased visual attention: participants might look at an object/subject because it is beautiful, complex, unexpected, expected, relevant, irrelevant, interesting, etc. Researchers should take care in not overinterpreting the data and in providing a thorough argumentation to bridge the gap between observing foveal attention and the cognitive, social, emotional or other processes underpinning that gaze behaviour.

18.6

Future directions

The advantages that wearable eye-tracking systems have to offer for research on multimodal interaction, viz. maximal flexibility, and the potential to combine multiple perspectives, also present specific challenges. Among the most widely discussed challenges is the above-mentioned annotation of data collected using mobile eye-trackers. In most of the studies discussed above, the annotation of the gaze data generated by the systems is done largely manually by identifying relevant areas of interest (AOI’s, e.g., the face and hands of an interlocutor, relevant objects in the interactional space) on which the gaze coordinates generated by the eye-tracking system can be mapped. It has been reported repeatedly that this annotation process is time-consuming (especially when the gaze data of multiple participants have been tracked simultaneously) and prone to human error. For that reason, several attempts have been made to develop automatic or semi-automatic annotation systems that may help alleviate this process, and future work will need to focus on how such systems can be integrated into a broader annotation workflow (including the compatibility with annotation platforms such as ELAN see Lausberg & Sloetjes, 2009; ANVIL see Kipp, 2014, etc.). Early attempts (e.g., De Beugher et al. (2014) and De Beugher (2016) explored the potential of using computer vision algorithms, like hand and face detection algorithms, as a basis for a semiautomatic annotation process. De Beugher et al. (2018) use a combination of these techniques in a prototype tool that allows for a basic gesture annotation based on mobile eye-tracking data, including gesture segmentation, positions of gesture in gesture space and motion analysis. Callemein et al. (2018) build on this work and present a framework based on OpenPose in combination with class analysis with re-identification (i.e., identifying and labelling individual persons in a recording), allowing for a fully automatic analysis. Hessels et al. (2018) use the OpenFace framework to develop a fully automatic and publicly available AOI construction method for detecting faces in mobile eye-tracking data. They present a validation study by comparing the automatic approach to earlier semi-automatic AOI construction methods and come to the conclusion that the automatic approach 293

Geert Brône and Bert Oben

is both effective (i.e., generating reliable output) and efficient (i.e., no manual input or feedback is needed). Jongerius et al. (2021) come to a similar conclusion in a validation study that compared the approach developed by Callemein and colleagues with manual annotations in a dataset of doctorpatient interactions in which the doctor’s gaze behaviour was tracked using a wearable system. They demonstrate that the computer vision algorithms used in the study produce results that are highly comparable to those of human annotators and can thus be successfully integrated into the annotation flow for eye-tracking research. Future work will need to explore the potential of automatic annotation techniques in more challenging conditions, such as highly mobile settings with moving objects and people, and for more complex AOI’s, e.g., fast-moving hands that are regularly occluded or suffer from motion blur, but such techniques will be an essential component in future researchers’ toolbox (and in fact are already integrated in a simplified form in commercially available software). Next to the development of (semi-)automatic annotation techniques, another avenue for future research is the integration of data from wearable eye-tracking devices with input from other data streams, including mobile EEG and other biometric signals such as galvanic skin response (GSR, as a measure of emotional arousal), electrocardiogram data (ECG), etc. Software and hardware providers are gradually extending their systems with workflows for the co-registration of multiple data types, including the synchronization of data streams using time stamp methods such as TTL events. This integration of data streams has been explored in several studies that attempt to get a grasp on cognitive processes in natural behaviour outside of the traditional lab setting. Ladouce et al. (2022), for instance, use a combination of mobile EEG and eye-tracking in a study involving participants engaged in a search task (looking for a book in a library). The authors conclude that the findings of their study ‘highlight the relevance of scalable experimental designs, joint brain and body recordings, and template-matching analyses to capture cognitive events during natural behaviours’ (2022). Another field of research that has explored data integration is affective computing. In a recent study by Tabbaa et al. (2021), participants’ emotions were triggered in a virtual reality setting (using a VR headset), while different data types were captured, including eye-tracking, ECG and GSR. The outcome of the study is a publicly available multimodal affective dataset, which can be used, for instance, for the development of emotion recognition models. While studies such as these illustrate the potential of collecting and integrating data, either in a real-life or a virtual reality setting, for research on cognition and emotion, its application in the domain of social-interaction studies is currently missing. Finally, a common aspect that spans over future directions such as efficient annotation workflows, integration of data streams and the use of simulative settings through VR/AR is the need for more budget-friendly hardware and software solutions. Currently, researchers are often limited in the use of the technology because of the high purchase and operating costs (cf. Section 18.5). For that reason, an important factor in pushing the field forward will be the accessibility of the various tools, including open-source software (as is the case in the above-mentioned studies by Callemein et al., 2018; Hessels et al., 2018), but also the public availability of datasets that were generated using wearable eye-tracking ­ ​­ systems. To date, only few datasets of (face-to-face) ­­ ­​­­ ​­ interactions with eye-tracking data have been made available (e.g., Insight Interaction Corpus developed by Brône & Oben, 2015).

Further reading Brône, G., & Oben, B. (Eds.). (2018). Eye-tracking in Interaction. Studies on the Role of Eye Gaze in Dialogue. John Benjamins. Conklin, K., Pellicer-Sánchez, A., & Carrol, G. (2018). ­Eye-Tracking: ​­ A Guide for Applied Linguistics Research. Cambridge University Press.

294

Mobile eye-tracking for multimodal interaction analysis Jongerius, C., Hessels, R.S., Romijn, J.A., Smets, E.M.A. & Hillen, M.A. (2020). The measurement of eye ­ ­363–389. ​­ contact in human interactions: A scoping review. Journal of Nonverbal Behavior. 44(3), Valtakari, N.V., Hooge, I., Viktorsson, C., Nyström, P., Falck-Ytter, T., Hessels, R. (2021). Eye tracking in ​­ human interaction: possibilities and limitations. Behavior Research Methods, 53, ­1592–1608.

Related topics Analysing reading with eye-tracking; analysing spoken language comprehension with eye-tracking; new directions in statistical analysis for experimental linguistics

References Allen, K. B., Woody, M. L., Rosen, D., Price, R. B., Amole, M. C., & Silk, J. S. (2020). Validating a mobile eye tracking measure of integrated attention bias and interpretation bias in youth. Cognitive Therapy and Research, 44(3), ­ ­668–677. ​­ Argyle, M. (1975). Bodily Communication. Methuen Publishing. Argyle, M. & Cook, M. (1976). Gaze and Mutual Gaze. Cambridge University Press. Auer, P. (2018). Gaze, addressee selection and turn-taking in three-party interaction. In G. Brône & B. Oben (Eds.), Eye-Tracking in Interaction. Studies on the Role of Eye Gaze in Dialogue (pp. ­ (197–231). ­­ ​­ ­ Benjamins. Auer, P. (2021a). ­ ­Turn-allocation ​­ and gaze: A multimodal revision of the “current-speaker-selects-next” ­­ ­​­­ ­​­­ ​­ rule ­ 117–140. ­ ​­ of the turn-taking system of conversation analysis. Discourse Studies, 23(2), Auer, P. (2021b). Gaze selects the next speaker in answers to questions pronominally addressed to more than one ­co-participant. ​­ Interactional Linguistics, 1(2), ­ ­154–182. ​­ Auer, P., & Zima, E. (2021). On word searches, gaze, and co-participation. Gesprächsforschung Online ​­ Zeitschrift zur verbalen Interaktion, 22, ­390–425. Barisic, I., Timmermans, B., Pfeiffer, U. J., Bente, G., Vogeley, K., & Schilbach, L. (2013). Using dual eyetracking to investigate real time social interactions. In Proceedings of the “Gaze Interaction in the PostWIMP World” Workshop at CHI2013. Paris. Bavelas, J. B., Coates L., & Johnson T. (2002). Listener responses as a collaborative process: The role of gaze. Journal of Communication, 52, ­566–580. ​­ Beukeleers, I., Brône, G., & Vermeerbergen, M. (2020). Unaddressed participants’ gaze behavior in Flemish Sign Language interactions: Planning gaze shifts after recognizing an upcoming (possible) turn comple​­ tion. Journal of Pragmatics, 162, ­62–83. Bishop, L., Cancino-Chacón, C., & Goebl, W. (2019). Eye gaze as a means of giving and seeking information during musical interaction. Consciousness and Cognition, 68, ­73–96. ​­ Brennan S., Chen X., Dickinson C., Neider M., & Zelinsky G. (2008). Coordinating cognition: The costs and benefits of shared gaze during collaborative search. Cognition, 106, 1465–1477. ­ ​­ Brône, G., & Oben, B. (2021). Monitoring the pretence. Intersubjective grounding, gaze and irony. In G. Kristiansen, K. Franco, S. De Pascale, L. Rosseel & W. Zhang (Eds.), Cognitive Sociolinguistics Revisited (pp. ­­  ­544–556). ​­ De Gruyter Mouton. Brône, G. & Oben, B. (2015). Insight Interaction. A multimodal and multifocal dialogue corpus. Language ­ ­195–214. ​­ Resources and Evaluation, 49(1), Brône, G. & Oben, B. (Eds.). (2018). Eye-tracking in Interaction. Studies on the Role of Eye Gaze in Dialogue. John Benjamins. Brône, G., Oben, B., Jehoul, A., Vranjes, J., & Feyaerts, K. (2017). Eye gaze and viewpoint in multimodal interaction management. Cognitive Linguistics, 28(3), ­ ­449–483. ​­ Callemein, T., Van Beeck, K., Brône, G., & Goedemé, T. (2018). Automated analysis of eye-tracker-based human-human ­ ​­ interaction studies. Information Science and Applications, 514, ­499–509. ​­ Clark, H. H. (1996). Using Language, Cambridge: Cambridge University Press. Cognolato, M., Atzori, M. & Müller, H. (2018). Head-mounted eye gaze tracking devices: An overview of modern devices and recent advances. Journal of Rehabilitation and Assistive Technology Engineering 5. 10.1177/2055668318773991 ­ Conklin, K., Pellicer-Sánchez, A., & Carrol, G. (2018). ­Eye-Tracking: ​­ A Guide for Applied Linguistics Research. Cambridge University Press.

295

Geert Brône and Bert Oben De Beugher, S. (2016). Computer vision techniques for automatic analysis of mobile eye-tracking data. Unpublished PhD dissertation. KU Leuven. De Beugher, S., Brône, G., & Goedemé, T. (2014). Automatic analysis of in-the-wild mobile eye-tracking experiments using object, face and person detection. (2014 International Conference on Computer Vision ­ ​­ Theory and Applications (VISAPP), 9, 625–633. De Beugher, S., Brône, G. & Goedemé, T. (2018). A semi-automatic annotation tool for unobtrusive gesture ­ ­433–460. ​­ ­ ­ ­­ ­​­­ ­​­­ ​­ analysis. Language Resources and Evaluation, 52(2), https://doi.org/10.1007/s10579-017-9404-9 Degutyte, Z. & Astell, A. (2021). The role of eye gaze in regulating turn taking in conversations: A systematized review of methods and findings. Frontiers in Psychology, 12, 616471. De Vries, C., Oben, B. & Brône, G. (2021). Exploring the role of the body in communicating ironic stance. ­ Languages and Modalities, 1, 65–80.Duchowski, A. (2007). Eye Tracking Methodology. Springer. Duncan, S. (1975). Interaction units during speaking turns in dyadic, face-to-face conversations. In A. Kendon, M. Richard M. Harris & M. R. Key (Eds.), Organization of Behavior in ­Face-to-Face ­​­­ ​­ Interaction (pp. ­­  ­199–212). ​­ Mouton The Hague. Emery, N. J. (2000). The eyes have it: The neuroethology, function and evolution of social gaze. Neuroscience and Biobehavioral Reviews, 24, ­581–604. ​­ Gehrer, N. A., Duchowski, A. T., Jusyte, A., Schönenberg, M. (2020). Eye contact during live social interaction in incarcerated psychopathic offenders. Personality Disorders: Theory, Research and Treatment, 11(6), ­ ­431–439. ​­ Goffman, E. (1981). Forms of Talk. University of Pennsylvania Press. Goodwin, C. (1981). Conversational Organization. Interaction between Speakers and Hearers. London Academic Press. Goodwin, M. H., & C. Goodwin. (1986). Gesture and co-participation in the activity of searching for a word. Semiotica, 62(1/2). ­ ­ ­51–75. ​­ Gullberg, M., & Holmqvist, K. (2006). What speakers do and what addressees look at. Visual attention to ­ ­53–82. ​­ gestures in human interaction live and on video. Pragmatics & Cognition, 14(1), Gullberg, M., & Kita, S. (2009). Attention to speech-accompanying gestures: Eye movements and information uptake. Journal of Nonverbal Behaviour, 33(4), ­ ­251–277. ​­ Haataja, E., Salonen, V., Laine, A., Toivanen, M., & Hannula, M. S. (2021). The relation between teacherstudent eye contact and teachers’ interpersonal behavior during group work: A multiple-person gazetracking case study in secondary mathematics education. Educational Psychology Review, 33(1), ­ ­51–67. ​­ Haensel, J. X., Smith, T. J., & Senju, A. (2022). Cultural differences in mutual gaze during face-to-face inter­ ​­ ­ ​­ ­­ ​­ 100–115. ­ ​­ actions: A dual head-mounted eye-tracking study. Visual Cognition, 30(1–2), Hessels, R. S., Benjamins, J. S., Cornelissen, T. H. W., & Hooge, I. T. C. (2018). A validation of automatically­generated ­areas-of-interest ­​­­ ​­ in videos of a face for ­eye-tracking ​­ research. Frontiers in Psychology, 9, 1367. Hessels, R. S., Niehorster, D. C., Holleman, G. A., Benjamins, J. S., & Hooge, I. T. C. (2020). Wearable technology for “real-world research”: realistic or not? Perception, 49(6), ­ 611–615. ­ ​­ Hirvenkari, L., Ruusuvuori, J., Saarinen, V. M., Kivioja, M., Peräkylä, A., & Hari, R. (2013). Influence of turn-taking in a two-person conversation on the gaze of a viewer. PloS one, 8(8), ­ e71569. Ho, S., Foulsham, T., & Kingstone A. (2015). Speaking and listening with the eyes: Gaze signaling during dyadic interactions. PLoS ONE, 10(8), ­ e0136905. Holler, J., & Kendrick, K. H. (2015). Unaddressed participants’ gaze in multi-person interaction: optimizing recipiency. Frontiers in Psychology, 6(98), ­ 1–14. ­ ​­ Holmqvist, K., Nyström, M., Andersson, R., Dewhurst, R., Jarodzka, H., & Van de Weijer, J. (2015). Eye­ ​ tracking. A Comprehensive Guide to Methods and Measures. Oxford University Press. Jehoul, A., Brône, G., & Feyaerts, K. (2017, September). Gaze patterns and fillers: Empirical data on the difference between Dutch ‘euh’and ‘euhm’. In Proceedings of the 4th European and 7th Nordic Symposium on Multimodal Communication (MMSYM 2016), Copenhagen, 29–30 September 2016 (No. 141, pp. 43– 50). Linköping University Electronic Press. Jongerius, C. (2022). Gaze in Medical Consultations: Measurement, Associations and Mechanisms. Unpublished doctoral dissertation. Jongerius, C., Hessels, R. S., Romijn, J. A., Smets, E., & Hillen, M. A. (2020). The measurement of eye contact in human interactions: A scoping review. Journal of Nonverbal Behavior, 44(3), ­ 363–389. ­ ​­

296

Mobile eye-tracking for multimodal interaction analysis Jongerius, C., Callemein, T., Goedemé, T., Van Beeck, K., Romijn, J. A., Smets, E. M. A., & Hillen, M. A. ­ ­ ​­ ­ ­​­­ ​­ ­ ­​ (2021). Eye-tracking glasses in face-to-face interactions: Manual versus automated assessment of areas­­of-interest. ​­ ­ ­2037–2048. ​­ Behavior Research Methods, 53(5), Jongerius, C., Hillen, M. A., Romijn, J. A., Smets, E. M. A., & Koole, T. (2022). Physician gaze shifts in patient-physician interactions: Functions, accounts and responses. Patient Education and Counseling, ­ ­2116–2129. ​­ 105(7), ­ ​­ Kendon, A. (1967). Some functions of gaze-direction in social interaction. Acta Psychologica, 26, 22–63. Kendrick, K., & Holler, J. (2017). Gaze direction signals response preference in conversation. Research on ­ 12–32. ­ ​­ Language and Social Interaction, 50(1), Kesselheim, W., Brandenberger, C., & Hottiger, C. (2021). How to notice a tsunami in a water tank: Joint discoveries in a science center. Gesprächsforschung. Online Zeitschrift zur verbalen Interaktion, 22, ­87–113. ​­ Kipp, M. (2014) ANVIL: A Universal Video Research Tool. In J. Durand, U. Gut, G. Kristofferson (Eds.), ­­  ­420–436). ​­ Handbook of Corpus Phonology, (pp. Oxford University Press Lachat, F., Conty, L., Hugueville, L., & George, N. (2012). Gaze cueing effect in a face-to-face situation. ­ ­177–190. ​­ Journal of Nonverbal Behavior, 36(3), Ladouce, S., Mustile, M., Ietswaart, M., & Dehais, F. (2022). Capturing cognitive events embedded in the real world using mobile electroencephalography and eye-tracking. Journal of Cognitive Neuroscience, ­ ­ ​­ 34(12), 2237–2255. Laner, B. (2022). “Guck mal der Baum”. Zur Verwendung von Wahrnehmungsimperativen mit und ohne mal. ​­ Gesprächsforschung. Online Zeitschrift zur verbalen Interaktion, 23, ­1–35. Lausberg, H., & Sloetjes, H. (2009). Coding gestural behavior with the NEUROGES-ELAN system. Behav­ ­841–849. ​­ ior Research Methods, Instruments, & Computers, 41(3), McKay K. T., Grainger S. A., Coundouris S. P., Skorich D. P., Phillips L. H., & Henry J. D. (2021). Visual attentional orienting by eye gaze: A meta-analytic review of the gaze-cueing effect. Psychological Bulletin, ­ ­ ​­ 147(12), 1269–1289. Muhonen, H., Pakarinen, E., Rasku-Puttonen, H., & Lerkkanen, M. -K. (2020). Dialogue through the eyes: Exploring teachers’ focus of attention during educational dialogue. International Journal of Educational Research, 102, 101607. Neider, M. B., Chen, X., Dickinson, C. A., Brennan, S. E., & Zelinsky, G. J. (2010). Coordinating spatial ­ ­718–724. ​­ referencing using shared gaze. Psychonomic Bulletin & Review, 17(5), Oben, B. (2018). Gaze as a predictor for lexical and gestural alignment. In G. Brône & B. Oben (Eds.), ­Eye-​ ­­  ­233–263). ​­ Tracking in Interaction. Studies on the Role of Eye Gaze in Dialogue (pp. John Benjamins. Oben, B. (2015). Modelling Interactive Alignment. A multimodal and Temporal Account. KU Leuven, unpublished PhD dissertation. Oben, B., & Brône, G. (2015). What you see is what you do. On the relationship between gaze and gesture in ­ ­546–562. ​­ multimodal alignment. Language and Cognition, 7(4), Pfeiffer, M., & Weiß, C. (2022). Reenactments during tellings: Using gaze for initiating reenactments, switch​­ ing roles and representing events. Journal of Pragmatics, 189, ­92–113. Richardson, D. & Dale, R. (2005). Looking to understand: The coupling between speakers’ and listeners’ eye movements and its relationship to discourse comprehension. Cognitive Science, 29, ­1045–1060. ​­ Richardson D., Dale R., & Kirkham, N. (2007). The art of conversation is coordination. Common ground and ​­ the coupling of eye movements during dialogue. Psychological Science, 18, ­407–413. Richardson, D., Dale, R. & Tomlinson, J. (2009). Conversation, gaze coordination & beliefs about context. Cognitive Science, 33(8), ­ ­1468–1482. ​­ Rogers, S. L., Speelman, C. P., Guidetti, O., Longmuir, M. (2018). Using dual eye tracking to uncover personal gaze patterns during social interaction. Scientific Reports, 8, 4271. Rossano, F. (2010). Questioning and responding in Italian. Journal of Pragmatics, 42(10), ­ ­2756–2771. ​­ Rossano, F. (2012a). Gaze in conversation. In J. Sidnell & T. Stivers (Eds.), The Handbook of Conversation Analysis (pp. ­­  ­308–329). ​­ ­Wiley-Blackwell. ​­ Rossano, F. (2012b). Gaze behavior in face-to-face interaction. Nijmegen, The Netherlands: Max Planck ­ ­ ­​­­ ​­ Institute for Psycholinguistics unpublished dissertation. Stephenson, L., Edwards, G., & Baybliss, A. (2021). From gaze perception to social cognition: The shared­attention system. Perspectives on Psychological Science, 16(3), ­ 553–576. ­ ​­

297

Geert Brône and Bert Oben Streeck, J. (2014). Mutual gaze and recognition: Revisiting Kendon’s ‘Gaze direction in two-person conversation’. In M. Seyfeddinipur & M. Gullberg (Eds.), From Gesture in Conversation to Visible Action as Utterance: Essays in Honor of Adam Kendon (pp. ­­  ­35–56). ​­ John Benjamins. Stukenbrock, A. (2020). Deixis, meta-perceptive gaze practices, and the interactional achievement of joint attention. Frontiers in Psychology, 11, 1779. Tabbaa, L., Searle, R., Bafti, S. M., Hossain, M., Intarasisrisawat, J., Glancy, M. & Ang, C. S. (2021). VREED: Virtual reality emotion recognition dataset using eye tracking & physiological measures. Proceedings Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 5(4), ­ ­1–20. ​­ Tatler, B. (2014). Eye movements from laboratory to life. In M. Horsley, M. Eliot, B. Knight, & R. Reilly ­­  ­17–35). ​­ (Eds.), Current Trends in Eye Tracking Research (pp. Springer. ­ Tiselius, E., & Sneed, K. (2020). Gaze and eye movement in dialogue interpreting: An eye-tracking study. ­ ­780–787. ​­ Bilingualism: Language and Cognition, 23(4), Vabalas, A., & Freeth, M. (2016). Brief report: Patterns of eye movements in face to face conversation are associated with autistic traits: Evidence from a student sample. Journal of Autism and Developmental ­ ­305–314. ​­ Disorders, 46(1), Valtakari, N. V., Hooge, I. T., Viktorsson, C., Nyström, P., Falck-Ytter, T., & Hessels, R. S. (2021). Eye track­ ­1592–1608 ​­ ing in human interaction: Possibilities and limitations. Behavior Research Methods, 53(4), Vandemoortele, S., Feyaerts, K., Reybrouck, M., De Bièvre, G., Brône, G., & De Baets, T. (2018). Gazing ­ at the partner in musical trios: A mobile eye-tracking study. Journal of Eye Movement Research, 11(2). Vertegaal, R. (1999). The GAZE Groupware system: Mediating joint attention in multiparty communication ​­ and collaboration. In M. G. Williams & M. W. Altom (Eds.), Proceedings of CHI, 99, ­294–301. ACM. Vranjes, J., Brône, G., & Feyaerts, K. (2018). On the role of gaze in the organization of turn-taking and se­ ­439–467. ​­ quence organization in interpreter-mediated dialogue. Language and Dialogue, 8(3), ​­ Weiß, C. (2018). When gaze-selected next speakers do not take the turn. Journal of Pragmatics, 133, ­28–44. Weiß, C., & Auer, P. (2016). Das Blickverhalten des Rezipienten bei Sprecherhäsitationen: eine explorative ​­ Studie. Gesprächsforschung, 17, ­132–167. ­ ­725–748. ​­ Zima, E. (2020). Gaze and feedback in triadic storytelling activities. Discourse Processes, 57(9), Zima, E., Weiß, C. & G. Brône, G. (2019). Gaze and overlap resolution in triadic interactions. Journal of ​­ Pragmatics, 140, ­49–69.

298

19 ANALYSING LANGUAGE USING BRAIN IMAGING Trisha Thomas, Francesca Pesciarelli, Clara D. Martin and Sendy Caffarra

19.1

Introduction and definitions

Language, a system of communication organised in sounds, symbols, and grammar, is widely accepted as one of the defining features of humanity and, as such, has fascinated philosophers, linguists, and scientists alike for centuries. As one of the most complex cognitive processes occurring in humankind, it arises from an intricate network of interacting regions of the brain and involves a wide set of cognitive mechanisms. Because it is such a broad field, we can investigate many different aspects of language and thus can ask questions ranging from what is the biological basis of human thought to which region in the brain is involved in processing speech under noisy conditions, or even how long does it take us to understand a joke told by a foreign-accented speaker? Relevant to the study of all these types of questions are the anatomy and physiology of the brain, the connectivity brain networks, and the functional activity of brain regions. Before advancements in neuroimaging methods enabled us to get a clearer picture of how the language system is organised in the brain, scientists relied on a combination of patient cases, deficit (or lesion) studies, and autopsies to learn about language processes. Through these patient cases, scientists began to discover important brain regions of language, such as the Broca’s and Wernicke’s area (Binder, 2015; Keller et al., 2009). These types of studies, while imperative to early knowledge of the neural bases of language, limited scientists to learning mostly about language impairments. Fortunately, the development of neuroimaging technologies has allowed us to understand more about language processes in the healthy brain and to begin to ask even more questions about language processes and networks. While early scientists considered language a separated specialised area within the brain, modern scientists have demonstrated that the story is much more complex than previously thought. Early lesion analysis methods were greatly enhanced with the development of modern brain imaging tools like magnetic resonance imaging (MRI), which allows for even more intricate structural insights into the neurobiology of language. While even more recent advancements now allow scientists to study brain region activation, interaction, and temporal aspects of linguistic task performances in vivo. These modern techniques enable us to explore language processes from a much wider perspective than early scientific investigations. This chapter will provide an overview of methods of examining the neurobiological processes of language. We will first explore historical methods of understanding and imaging language in 299

DOI: 10.4324/9781003392972-22

Trisha Thomas et al.

the brain and the models of language that arose from those studies. We then will discuss modern research methods and how modern imagining tools, like MRI and techniques derived from it, allow us to learn more about areas of activity and even trace how regions are connected. Because of these tools, scientists now understand that language involves a complex network of multiple brain regions and connecting tracts. This chapter will provide a summary of these modern research findings and concludes by identifying future directions in the study of language in the brain.

19.2

Historical perspectives

Early attempts to understand neurological processes, such as language, focused on linking specific brain structures to specific functions via behavioural outcomes, often relying upon pathology analysis. In the mid-1800s, to explore the neural underpinnings of language, scientists began to conduct ‘deficit’ studies with patients or cadavers of patients suffering from neurological disorders. Thus, for many years, scientists were able to study the localisation of language processes in the brain through deficit, or lesion, analysis, specifically, by learning which aspects of speech are disrupted when stimulating a specific area of the brain or by observing which aspects of language were disrupted when a certain area of the brain was damaged. While today much of the information learned from these early experiments has been challenged, the classical model of the neural basis of language that was derived from these early patient cases was foundational to the modern field of neuroscience. Perhaps the most famous patient cases in the development of this field are those of the French surgeon, Paul Broca, during the 1860s (Broca, 1861). Broca was presented with several aphasic patients who had suffered strokes or from brain lesions and subsequently lost productive speech abilities. These cases led him to theorise that a specific area in the left inferior frontal gyrus, an area now known as Broca’s area, was responsible for productive speech. While Broca’s cases dealt with expressive aphasia, around ten years later another scientist, Carl Wernicke, became famous for his cases concerning receptive aphasia. Wernicke studied a patient who was able to produce fluent speech but unable to understand spoken or written language. During an autopsy study, the lesion was found to be located in the left rear parietal-temporal region of the brain, a region now known as Wernicke’s area, which Wernicke theorised was critically linked to the ability to understand language (Wernicke, 1874). Because of this ‘deficit’ methodology employed to study language processes, for many years scientists had asserted a localisationistic view of neural processes, where each brain function is thought to correspond to a specific area of the brain. While these early patient studies were instrumental to the development of the field of neuroscience and neural bases of language in the brain, early technologies furthered scientific findings and paved the way for future technological advancements in neuroscience. With early neuroimaging and electrophysiological advancements, scientists began to widen their understanding of the brain beyond this early localisationistic view and thus the basis of our modern understanding of neuroscience began to take shape. Early neuroimaging inventions created to study brain structure, such as the X-ray and angiography, enabled scientists to generate some of the first images of the brain and thus detect aneurysms (i.e., a bulge in a blood vessel that can rupture, causing a medical emergency that can result in brain damage and even death). Angiography was invented in 1927 and is a neuroimaging technique that involves injecting dye into the veins to be able to image the brain using an X-ray. Later, positron emission tomography (PET) was first developed in the 1960s. PET was developed to study organ function through the use of radioisotopes, by introducing a radioactive agent into the bloodstream. Not long after, Computed tomography (CT) was developed in the 1970s. CT 300

Analysing language using brain imaging

and PET scans revolutionised diagnostic medical imaging by allowing scientists and physicians to non-invasively image the brain and other organs, bones, and tissues. Today they are often combined in a CT/PET scan for medical diagnostic use. Similar to the angiogram, a dye can be injected into the bloodstream to highlight different tissues. While we have since developed other powerful neuroimaging tools, such as MRI and human functional near-infrared spectroscopy (fNIRS), CT/PET scans remain essential to the medical and research field. In the early 20th century, around the same time as scientists were developing early neuroimaging tools to observe brain structure, the first recordings of electrical activity to describe human brain activity were also performed. In 1924, a German physicist named Hans Berger became the first scientist to record an EEG of human scalp data. EEG measures voltage fluctuations – on the scalp – that are generated by synchronised activity of populations of neurons. It became popular in the late 1930s for its use in clinical neurology and became especially critical as a diagnostic tool for epilepsy. In addition to its clinical applications, scientists still rely on EEG and event­related potentials (ERPs-averaged ­­­­ ​­​­​­ measurements of brain responses locked to the presentation of a specific event) derived from EEG as a non-invasive technique to study brain activity with hightime sensitivity. Thus, by the mid-1950s, scientists had a growing set of tools to study language processes in the brain from a variety of perspectives, both structurally and functionally. In the next section, we will discuss how research in modern neuroscience is typically framed.

19.3

Critical issues and topics

Research has come a long way since the early days of autopsy studies. Nowadays, the variety of imaging tools available enables scientists to study the brain at the level of the neuron to the level of complex brain networks from multiple perspectives. The brain can be described from two fundamentally different perspectives: structural and functional. While many people are familiar with structural neuroimaging measures due to their prevalence in the medical field, researchers also use functional techniques to study brain activations and temporalisations. Thus, scientists must first consider whether they are interested in measuring brain structure or brain function. Structural measures are specialised to visualising anatomical structures and thus are useful for the detection of brain abnormalities or damages. Scientists interested in brain structure can use tools like MRI and techniques like diffusion MRI to map anatomical structures and pathways (see below for a detailed descriptions of these techniques). Functional measures are used to identify brain areas and processes implicated in a specific cognitive or behavioural task. If scientists are interested in studying brain function, two questions guide modern functional research: where do processes happen in the brain? And, when do processes happen? Scientists interested in where processes happen can use technologies like fMRI, while those interested in when they happen can use technologies such as EEG or MEG. Subsequent sections will focus on how scientists use these different techniques to study language processes within the framework of these guiding questions.

19.4

Current contributions and research

As mentioned above, technological advancements in neuroimaging methods have greatly increased the types of studies that can be done to analyse language using brain imaging. That said, a definitive account of the neural basis of language has yet to be provided. While we now know that early accounts of the Broca’s and Wernicke’s area role in language were oversimplified (Koechlin & Jubault, 2006; Nakai et al., 2005; Novais-Santos et al., 2007; Saxe & Kanwisher, 2004; Tettamanti & Weniger, 2006), scientists are still discovering the complexities of the neural 301

Trisha Thomas et al.

language network. However, combined data from these various techniques of neuroimaging is providing more and more evidence that language arises from a network much more extensive than early scientists hypothesised (Dikker et al., 2009; Duncan, 2001; Timmann & Daum, 2007). For instance, we now know that areas like the angular gyrus and supramarginal gyrus are also important to linguistic processes. Furthermore, as opposed to the early localisationistic view of the basis of language, the use of these modern technologies and methodologies has facilitated the discovery that the language network relies much more heavily on a complex web of interactions between various language centres in the brain. In the following section, we will give an overview of commonly employed neuroimaging and electrophysiological methods and how they contribute to our understanding of language.

19.5

Main research methods

19.5.1 19.5.1.1

Structural measures

Magnetic resonance imaging

MRI is a non-invasive imaging tool that generates detailed anatomical images and has revolutionised the ability to detect and diagnose diseases. MRI was first used clinically in the early 1980s and since then has been a valued tool in hospitals and research centres all over the world. MRI works by generating a strong magnetic field and using timed radio frequency (RF) magnetic pulses to align the body’s own magnetic nuclei of hydrogen atoms with the field. When triggered by a pulse, these nuclei generate a signal (the MR signal) that is detected, mapped, and recreated into an image (the MR image). MRI is especially useful for generating images of soft tissue and unlike a CT scan does not emit radiation, making it an excellent candidate for use in clinical language research and the diagnosis of disorders and diseases that affect language. Indeed, much of the research conducted for the purpose of collecting structural data about language is done for examining typical and atypical language development or damage. Studies using structural imaging methods usually involve comparing brain structure of a specific region across two participant groups or attempt to correlate regional brain structure with a language ability of interest (Richardson & Price, 2009). For example, using structural MRI, grey (neural cell bodies, axon terminals, dendrites) and white matter (bundles of myelin coated axons) in the inferior parietal lobes has been found to be associated with language learning (e.g., Lee et al., 2007; Mechelli et al., 2004). Traditional MRI structural studies often use the technique of voxelbased morphometry (VBM) and then compare brain structure with a behavioural language measure. VBM is a statistical technique that can identify differences in the concentration of grey or white matter and offers high-spatial resolution across the whole brain (e.g., Ashburner & Friston, 2000; Mechelli et al., 2005). While VBM is especially valuable for localising detected differences between groups, another technique, diffusion MRI, can also be utilised to investigate the white matter structural correlates of language.

19.5.1.2

Diffusion magnetic resonance imaging

Diffusion magnetic resonance imaging (dMRI) is a data acquisition technique of MRI also used to produce images of white matter brain tissue structure and structural connectivity. The technique of diffusion tensor imaging (DTI) utilises water diffusion detection to track white matter pathways in the brain. The direction of travel of the water molecules affects the MRI signal which 302

Analysing language using brain imaging

­Figure  19.1

MR images demonstrating structural MRI, grey and white matter structures (left), diffusion MRI, white matter fibres displayed in different tracts via colours (middle) and fMRI, activation intensity in a gradient of red to yellow of select brain areas as compared to a baseline condition (right). Reproduced with the permission of the copyright holder and participant.

can be measured allowing us to infer an indirect tract of white matter pathways, or tractography, see Hagmann et al., 2006; Le Bihan et al., 2001; and Wandell, 2016) for an overview of DTI. dMRI has critical applications in clinical neuroscience: for example, in progression monitoring of degenerative diseases and tracking compensatory mechanisms and adaptation following brain damage (Ciccarelli et al., 2008). Perhaps the best-known white matter tract involved in language processing is the arcuate fasciculus (depicted on Figure 19.1). The arcuate fasciculus is a bundle of fibres that connects the Broca’s area with the Wernicke’s area and has been described as the dorsal language pathway (e.g., Catani & Mesulam, 2008; Smits et al., 2014). Additionally, several ventral pathways, including the uncinate fasciculus, the inferior ­­fronto-occipital ​­​­​­ fasciculus (written ­­ language), and the inferior longitudinal fasciculus (visual object recognition), have now been found to play a role in language processing among many others. The arcuate fasciculus and uncinate fasciculus pathways make up part of the dual functional language system (see Saur et al., 2008). ­​­­­​­­​­­­ ​­​­​­ While the dorsal pathway seems to be involved in ­­sound-to-motor mapping, the ventral pathway, uncinate fasciculus, seems to be associated with ­­sound-to-meaning ­​­­­​­­​­­­ ​­​­​­ integration. Scientists have already discovered many more language-related bundles and further advancements in this technique will enable the discovery of even more in the coming years.

19.5.2 19.5.2.1

Functional measures

Functional magnetic resonance imaging

Functional magnetic resonance imaging (fMRI) is another relatively recent tool that has allowed neuroscientists to spatially localise brain activities. fMRI was developed after the non-invasive technology of MRI was created to map anatomical brain structure. While MRI enables scientists and clinicians to obtain structural images of the brain, fMRI generates images of metabolic activity within the anatomical images. To do this, it relies on an indirect measure of blood flow called, or BOLD signal. The BOLD signal detects the level of oxygen ­­blood-oxygen-level-dependent, ­​­­­​­­​­­­ ­​­­­​­­​­­­ ​­​­​­ in the blood flow of a localised region, relying upon the fact that blood oxygenation levels affect magnetisation and increases in oxygenation occur with increases in neural activity. fMRI experiments require the participants to alternate between control periods of rest and engagement in a task so that the blood flow alterations in brain areas engaged in the task can be indirectly inferred from the signal intensity contrast between those regions and control levels. Mathematical formulas 303

Trisha Thomas et al.

­Figure  19.2

Schematic of ­­language-related ​­​­​­ areas in the brain.

and reconstruction algorithms are applied to structural images collected in the MRI to create an ​­​­​­ overlap of many ­­low-resolution BOLD images taken quickly during the experiment and one ­­high-​­​­​ resolution structural image. The superimposed images provide an illustration of areas of brain activity during the experiment. fMRI is becoming increasingly employed to study the fluctuations in the activations of brain areas involved in various language processes (Ardila & Bernal, 2016). Within the realm of language research, fMRI is mainly used to study language comprehension as well as language production. fMRI studies focus on reporting hemodynamic activity during language tasks and using this technique functional regions of interest can be identified. fMRI has enabled scientists to identify additional language processing areas of the brain that furthered our understanding of the localisation of language networks beyond the classical models that lesion studies gave rise to. While language processing is still generally left-lateralised and does involve areas that were discovered to be important to language centuries ago, high-resolution imaging has led to the discovery of language areas outside of the traditional Wernicke and Broca area, such as the angular gyri, middle temporal gyrus, and extensive prefrontal areas (Binder et al., 1997). Table 19.1 and Figure 19.2 show a sum​­​­​­ mary of regions important to various ­­language-related processes.

19.5.2.2

Electrophysiology and Magnetoencephalography (EEG and MEG)

Electrophysiology is an especially useful tool for the study of fine-grain time-sensitive information for language processes in the brain. With EEG and MEG, researchers can isolate and temporally localise the neural correlates of a variety of specific language-related processes. Because of the high-temporal resolution of EEG and MEG, linguistic experiments using these techniques often employ an experiment design of rapid serial presentation of stimuli. The fine-grained temporal resolution enables researchers to isolate the processes that are related to the analysis of specific linguistic information presented at a specific time point, such as the time course of word processing within a sentence. 304

Analysing language using brain imaging ­Table  19.1 Summary of some relevant ­­language-related ​­​­​­ brain areas Brain areas and white matter fibres

Linguistic processes

Inferior frontal gyrus (Broca’s area, hereafter BA) Temporal lobe (Wernicke’s area, hereafter WA) Nerve fibres connecting BA and WA (arcuate ­­ fasciculus) Heschl’s gyrus Angular gyrus/Superior ­­ Temporal sulcus

Language production, syntactic analysis (Broca, 1861; Koechlin & Jubault, 2006; Tettamanti & Weniger, 2006) Language comprehension and semantic analysis (Binder, 2015; Wernicke, 1874) Phonetic information sent between BA and WA (Catani & Mesulam, 2008) Acoustic analysis (Warrier et al., 2009) Converts between visual and auditory stimuli, classify info, abstract thinking (Deen et al., 2015) Directs movement of muscles for articulation (de Lafuente & Romo, 2004) Visual letter perception and visual word recognition (Cohen et al., 2002) Phonological processing, visual word recognition (Sliwinska et al., 2012)

Motor cortex Visual cortex and ventral occipital cortex Supramarginal gyrus

While EEG and MEG are similar techniques, they have some important differences. EEG is recorded with multiple electrodes that are placed on the scalp of a participant and allows to track fluctuations in electrophysiological brain activity. As mentioned in an earlier section, scientists often use ERPs to look at how electrophysiological responses change in regard to a specific temporal event (e.g., spoken word onset). ERPs are the sum of very small voltages elicited by synchronised neural activity in response to a specific event that is repeated several times across the experiment. They allow scientists to measure the time course of mental processes associated with the presentation of a specific event. For example, we now know that any semantic content (e.g., an image or a word) triggers a change in amplitude response around 400 ms after stimulus presentation. This electrophysiological fluctuation has a negative polarity; it is maximum over the centro-posterior part of the scalp, and it modulates its amplitude when a semantic violation is presented (e.g., a sentence with a semantically incongruous word; Kutas & Federmeier, 2011). MEG is a technique that provides very similar information as EEG but rather than recording electrical activity, it records magnetic activity. Whereas EEG detects electrical fields generated by extracellular currents, MEG detects magnetic fields produced by intracellular currents using a helmet, which contains multiple sensor coils, in which participants place their head. While EEG detects post-synaptic activity of orthogonal pyramidal cells, MEG is sensitive to signals arising from tangentially oriented dendrites. Hence, although these two techniques have similarly hightemporal resolutions, their sensitivity to electrophysiological activity does not entirely overlap. Furthermore, unlike EEG, MEG has usually a short preparation time as compared to EEG systems, making it easier to use with developmental and clinical populations. Moreover, as compared to EEG, MEG signals are unaffected by tissue defects of the brain and skull bone, making MEG a better-suited option for the spatial localisation of electrophysiological activity in the brain (thanks to source reconstruction algorithms). However, MEG is much more costly than EEG. Rather than measuring ERPs, MEG can be used to estimate the magnetic counterpart of the ERP, called eventrelated fields, or ERF, which can be localised in source space (i.e., the tridimensional space where neural sources can be localised). 305

Trisha Thomas et al.

ERPs and ERFs are divided into early and late waveforms. Early waves are described as peaking within the first 100 milliseconds (ms) post-stimuli onset and are often linked to sensory processes such as visual object perception. While late waveforms are thought to reflect the participant’s cognitive response to the stimulus such as a processing of semantic information (e.g., spoken words, images, text; Luck, 2014; Sur & Sinha, 2009). In addition to timing, ERPs/ERFs are also characterised by topographical distribution (where the response reaches its maximum amplitude on the scalp). ERP/ERF components can thus be identified by timing (when the event appears in ms), polarity (only for ERP, whether a response is positive or negative), and topographic distribution (where the maximal amplitude of the event is observed on the scalp). See Table 19.2 for a ­Table  19.2 Some electrophysiological correlates of language processing Component

Time window

Distribution

Related processes

M/N170 ­­

­­130–200 ​­​­​­

Posterior

PMN

­­250–300 ​­​­​­

­­Fronto-central ​­​­​­

N/M400 ­­

­­300–500 ​­​­​­

­­Centro-posterior ​­​­​­

P600

­­500–800 ​­​­​­

­­Centro-posterior ​­​­​­

Visual object recognition (e.g., letter) In visual word recognition, it has been related to expertise in recognising orthographic from ­­non-​­​­​ orthographic symbols ((Maurer et al., 2005); (Emmorey et  al., 2017); (Chun-Nang et  al., 2004)). ­­ ­­­­ ​­​­​­ Greater M/N170 are observed for known letters as compared to visual controls. Sound analysis It has been related to normalisation between acoustic input and phonological representations (Newman and Connolly, 2009). Phonological incongruencies, and/or a mismatch between perceived speech input and a lexical representation, are thought to elicit a larger PMN (Connolly and Phillips, 1994; Connolly et al., 2001). Semantic analysis It has been related to aspects of ­­lexical-semantic ​­​­​­ analysis and was initially observed in response to outright semantic violations (e.g., “He spread the warm bread with socks” as compared to “He spread the warm bread with jam”). The N400 effect typically consists of a greater negative amplitude for the violated as compared to the control condition (Kutas & Federmeier, 2011). Syntactic analysis It has been observed in response to a wide range of syntactic and semantic violations (e.g., ­­ ­­word-order ​­​­​­ violations, grammatical agreement violations, but also prediction errors, e.g., “Tomorrow, he bit the apple”). It is thought to be related to controlled processes of repair and analysis, usually carried out to overcome errors and achieve a broad comprehension of sentence meaning. The P600 effect consists of an increased positive amplitude for the violated condition as compared to the correct one (Osterhout and Mobley, 1995).

306

Analysing language using brain imaging

summary of components related to language processes. These neural correlates help researchers to understand how the brain processes information at an extremely high-temporal resolution (ms by ms). Table 19.2 lists some of the most relevant ERP/ERF correlates of language processing. Both EEG and MEG data can be time-locked to a stimulus onset as described above to measure evoked activity through ERP/ERF. For example, the P200 is a positive peaking waveform that generally occurs around 200 milliseconds post-stimulus onset (e.g. the start of a trial). Additionally, the data can be described in terms of oscillations defined by frequency bands. For example, the Alpha frequency band represents a set of oscillations (waveforms rhythmically rising and falling) within the frequency range of 7–12 Hz and its amplitude can be modulated by cognitive processes. More and more scientists have begun to use EEG/MEG to examine how brain oscillations track speech (Giraud & Poeppel, 2012). Slow frequencies, θ-oscillations (~4–6 Hz), have been associated with acoustic processing and sentence parsing (see e.g. Grabot et al., 2017; Hald et al., 2006; Jensen et al., 2021; Luo & Poeppel, 2007). High frequencies have been associated with a wide variety of linguistic processes, such as semantic syntactic integration (Hald et al., 2006) and lexical processing (Towle et al., 2008).

19.5.2.3

Functional near-infrared spectroscopy

Functional near-infrared spectroscopy (fNIRS) is another neuroimaging technique that is increasingly used in scientific studies. This popularity is due to several advantages that fNIRS offers in comparison to other neuroimaging techniques, namely that it is portable, allows for movement, and is non-invasive (Pinti et al., 2020). Despite resembling an EEG cap of electrodes, fNIRS resembles fMRI in that it measures functional brain activity by detecting changes in hemodynamic activity. Unlike fMRI, it is only sensitive to changes near the cortical surface, like EEG. fNIRS uses near-infrared light to measure changes in haemoglobin concentration (for a review, see Scholkmann et al., 2014). fNIRS is often employed in experiments involving infants because it has both a relatively high-temporal ­­ ​­​­​­ and spatial resolution despite ­­low-movement ​­​­​­ sensitivity. It is also used for other populations that cannot easily be studied using other techniques, such as those with cochlear implants or sensitivity disorders. fNIRS can be particularly advantageous for language research because it does not introduce the noise or bodily restrictions that an MRI scanner would, which makes it appropriate for studying speech perception and language disorders, such as Autism Spectrum Disorder (Butler et al., 2020). Additionally, because of the portability of fNIRS, it is an ecologically valid technique that offers the opportunity to study brain responses in naturalistic and clinical settings. One recent study conducted on monolingual and bilingual infants using fNIRS at rest examined their functional brain connectivity to assess whether early bilingual immersion experience impacts the configuration of the emerging functional brain networks. The authors found that despite previous studies demonstrating behavioural differences between monolingual and bilingual infants, intrinsic functional organisation of the brain remained the same (Blanco et al. in press). In the coming years, we will certainly see many more studies using this recent, valuable technique.

19.6

Recommendations for practice

Researchers interested in embarking on an analysis of linguistic processes using neuroimaging should be careful to first decide whether they aim to study brain structure or brain function during language tasks. If interested in function, the researcher should develop a specific question of interest within the field and, importantly, direct their hypotheses within the framework of the scope 307

Trisha Thomas et al.

of what neuroimaging can reveal: namely, where is a process located or when is a process happening? Once a researcher knows which of those two broad questions they are interested in using as a framework for their research, they can pick the neuroimaging technique that most facilitates their investigation. For example, if a scientist is interested in studying a short transitory linguistic phenomenon, such as brain response to a target word embedded in a sentence, they might give ­­ ​­​­​­ ­­ preference to time-sensitive techniques like EEG/MEG. While scientists interested in processes that gradually emerge over a large frame of time, like speech recalibration (short-term perceptual adaptation to speech sounds), might consider a technique like fMRI. Researchers should be cautious to understand the limitations of their chosen technique and whether a combination of techniques is practically and economically feasible for their specific projects. Researchers conducting linguistic investigations should also consider conforming to the principles of the Open Science initiative regardless of experiment technique employed. The Open Science initiative has gained momentum in the academic community and is now widely regarded as good practice in science. Open science is the initiative to increase the accessibility of scientific research and encourages transparency and collaboration at all levels of the scientific process by sharing data, scripts, software, etc. This initiative has led to new resources and opportunities in academic research that we encourage researchers to take advantage of such as wide-scale collabo­­ ​­​­​­ ­­ ​­​­​­ rations, open-source code and data and the ability to pre-register academic studies. Researchers interested in open science will find an abundance of useable code in various programming languages for a variety of experimental techniques on GithHub, while experiment materials can be found on OSF (Open Science Framework), a platform to share raw data/materials, and open processing pipelines for neuroimaging analyses are kept in docker containers. Open science is especially beneficial for enhancing reproducibility and enabling scientists to collaborate on large datasets, data analyses, and pre-processing methods, and above all, promotes integrity in scientific practices. For example, scientists working with fMRI and tractography data in linguistic neuroimaging have recently released a reproducible containerised workflow to reconstruct white matter tracts (Liu et al., 2022). This initiative is transforming the scientific community and likely will continue to influence future directions of scientific inquiry.

19.7

Future directions

In addition to the influences of the Open Science initiative on the design and collaboration of forthcoming studies, scientists in language research are taking advantage of increasingly fine-grained neuroimaging methods to clarify the connection between structural and functional language correlates. Sometimes researchers combine two techniques to try to maximise the information they can get about processes of interest. Combining two or more neuroimaging techniques for the purpose of mitigating the limitations of one technique (e.g., increasing spatiotemporal resolution) is called multimodal imaging (see Uludağ & Roebroeck, 2014 for an overview). The combination of techniques can either be done simultaneously during data collection or combined posteriorly during analysis. When conducting multimodal experiments, for example, concurrent fMRI and EEG data collection, scientists must consider downsides, such as that the combination introduces artefacts and a degraded signal-to-noise ratio along with the benefits of revealing spatial sources of neural oscillations. Scientific advances in applied technology, like machine-learning algorithms, are increasingly transforming what information can be gathered from data analysis in language research. Machine learning is a subset of artificial intelligence where algorithms fed large datasets use statistical modelling to make classifications or predictions. The models mimic human learning and thus, the more 308

Analysing language using brain imaging

data they receive the better the models get at identifying patterns. Machine learning has provided scientists with the ability to train machine-learning algorithms on subsets of data in order to try to predict language correlates based on brain function or structure (see Hale et al., 2022). Natural language processing is a new field of language research that uses machine learning to understand human languages and constructs the foundation of language learning apps, automatic speech recognition, voice assistants, handwriting recognition, and translation programmes, among many other things. Neural networks are a type of machine-learning algorithm that models the human brain. Neural networks have had success in acoustic modelling (Arisoy et al., 2012) and generating models for the mapping of linguistic information onto mental representations (Hale et al., 2022). Machine-learning algorithms also face challenges such as being susceptible to biases (Castro, 2019; Chouldechova, 2017; Obermeyer et al., 2019) and insufficient input data. In addition to relying on artificial intelligence to make predictions, scientists are also increasingly interested in the role of genetic contributions to behavioural patterns. Recent work has not only revealed the genetic underpinnings of developmental disorders (see Carrion-Castillo et al., 2021) but also the role of genetics in more benign behaviours, such as a persistent tendency to tardiness, character traits, and life outcomes. Behavioural genetics relies on large datasets or twin studies and thus the field is gaining momentum as the open science initiative has led many labs to create open and collaborative datasets. Behavioural genetics can be applied to language research to examine the genetic role in both typical and atypical language development. Additionally, genetics is increasingly used to account for brain functions of language processes as well. The goal of this line of research, according to Poeppel (2011; p. 384), is to “identify the genetic basis of the specific neural circuits that in turn constitute the basis for the operations that underpin speech and language.” Through interactions between these re-emerging fields, the open science initiative, and the combination of neuroimaging techniques, it is certain that the next 20 years will be filled with an expanse of advancements in the field of language research.

Acknowledgements The authors of this chapter were supported by the Basque Government through the BERC 2022– 2025 programme; the Spanish State Research Agency [BCBL Severo Ochoa excellence accreditation ­­CEX2020-001010-S]; ­​­­­​­­​­­­ ​­​­​­ the H2020 European Research Council [ERC Consolidator Grant ERC­­ ­​­­­​­­​ to CDM; Marie Sklodowska-Curie grant 837228 to SC]; the Spanish Ministry ­­­2018-COG-819093 ­​­­­​­­​­­­ ​­​­​­ ­­ ​­​­​­ of Economy and Competitiveness [PID2020–113926GB-I00 ­­ ­​­­­​­­​­­­ ​­​­​­ to CDM]; the Basque Government [PIBA18–29 to CDM]; the Italian Ministry of University and Research (Programma giovani ricercatori Rita Levi Montalcini) to SC; and the Programa Predoctoral de Formación de Personal Investigador No Doctor del Departamento de Educación del Gobierno Vasco (PRE_2021_2_0006 to TT).

Further reading Balconi, M. (Ed.). ­­ (2010) ­­ Neuropsychology of Communication. Springer. Guy, S. (2009). [Review of the book A Path Worth ­­Exploring -​­​­​­ Neurolinguistics: An Introduction to Spoken Language Processing and Its Disorders, by John C. L. Ingram]. Journal of the International Neuropsy­­ ­­162–163. ​­​­​­ chological Society: JINS, 15(1), Ivic, M. (2005). [Review of the Book Foundations of Language. Brain, Meaning, Grammar, Evolution, by ­­ ­­415–418. ​­​­​­ Ray Jackendoff]. Language in Society, 32(3), Small, S. L., & Hickok, G. (2016). The neurobiology of language. In Neurobiology of Language (pp. ­­­  ­­3–9). ​­​­​­ Elsevier. https://doi.org/10.1016/B978-0-12-407794-2.00001-8 ­­ ­­ ­­­­ ­​­­­​­­​­­­ ­​­­­​­­​­­­ ­​­­­​­­​­­­ ​­​­​­ ­­ ​­​­​­ Traxler, M. J., & Gernsbacher, M. A. (Eds.). (2006). Handbook of Psycholinguistics (2nd ed). Elsevier.

309

Trisha Thomas et al.

Related topics New directions in statistical analysis for experimental linguistics; historical perspectives on the use of experimental methods in linguistics; contrasting online and offline measures: examples from experimental research on linguistic relativity; controlling social factors in experimental linguistics; analysing speech perception

References Ardila, A., & Bernal, B. (2016). Neuroimaging in language: The contribution of fMRI. In A. Ardila (Ed.), Neuroimaging (pp. ­­­  ­­1–9). ​­​­​­ SMGroup.. Arisoy, E., Khudanpur, S., & Ramabhadran, B. (Eds.). (2012). Proceedings of the ­­NAACL-HLT 2012 Work​­​­​­ shop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT. Association for Computational Linguistics. Ashburner, J.,  & Friston, K. J. (2000). Voxel-based morphometry—the methods. NeuroImage, 11(6), ­­ ­­ ​­​­​­ ­­ ​­​­​­ ­­ ­­ 805–821. ​­​­​­ Blanco, B., Molnar, M., Carreiras, M., Collins-Jones, L.H., Vidal, E., Cooper, R.J. and Caballero-Gaudes, C. ­­ ​­​­​­ ­­ ​­​­​­ (2020). Monolingual and bilingual infants rely on the same brain networks: Evidence from resting-state functional connectivity. bioRxiv. https://doi.org/10.1101/2020.04.10.035469 ­­ ­­ ­­ Binder, J. R. (2015). The Wernicke area: Modern evidence and a reinterpretation. Neurology, 85(24), ­­ ­­2170–2175. ​­​­​­ Binder, J. R., Frost, J. A., Hammeke, T. A., Cox, R. W., Rao, S. M., & Prieto, T. (1997). Human brain language areas identified by functional magnetic resonance imaging. Journal of Neuroscience, 17(1), ­­ 353–362. ­­ ​­​­​­ Broca, P. (1861). Remarques sur le siege de la faculte du langage articule suivies d’une observation d’aphemie. Bull. Soc. Anatom., 2. Ser, 6, 330–357. ­­ ​­​­​­ Butler, L. K., Kiran, S.,  & Tager, -Flusberg Helen. (2020). Functional near-infrared spectroscopy in the study ​­​­​­ ­­ ­­ ​­​­​­ of speech and language impairment across the life span: A systematic review. American Journal of Speech­­ ​­​­​ Language Pathology, 29(3), ­ ­­ 1674–1701. ­­ ​­​­​­ Carrion-Castillo, A., Estruch, S. B., Maassen, B., Franke, B., Francks, C.,  & Fisher, S. E. (2021). Whole­­ ​­​­​­ ­­ ­­ ​­​­​ genome sequencing identifies functional noncoding variation in SEMA3C that cosegregates with dyslexia in a multigenerational family. Human Genetics, 140(8), ­­ 1183–1200. ­­ ​­​­​­ Castro, C. (2019). What’s wrong with machine bias. Ergo, an Open Access Journal of Philosophy, 6(15), ­­ 2019–2020. ­­ ​­​­​­ Catani, M., & Mesulam, M. (2008). The arcuate fasciculus and the disconnection theme in language and aphasia: History and current state. Cortex; A Journal Devoted to the Study of the Nervous System and Behavior, 44(8), ­­ 953–961. ­­ ​­​­​­ Chouldechova, A. (2017). Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big Data, 5(2), ­­ 153–163. ­­ ​­​­​­ Chun-Nang, A., Curran, T., Woroch, B., & Gauthier, I. (2004). N170 associated with expertise in letter perception. Journal of Vision, 4(8), ­­ 519. Ciccarelli, O., Catani, M., Johansen-Berg, H., Clark, C.,  & Thompson, A. (2008). Diffusion-based tractogra­­ ​­​­​­ ­­ ­­ ​­​­​­ phy in neurological disorders: Concepts, applications, and future developments. The Lancet Neurology, 7(8), ­­ 715–727. ­­ ​­​­​­ Cohen, L., Lehéricy, S., Chochon, F., Lemer, C., Rivaud, S., Dehaene, S. (2002). Language-specific tuning of visual cortex? Functional properties of the visual word form area. Brain, 125(5), ­­ 1054–1069. ­­ ​­​­​­ Connolly, J. F., & Phillips, N. A. (1994). Event-related potential components reflect phonological and semantic processing of the terminal word of spoken sentences. Journal of Cognitive Neuroscience, 6(3), ­­ 256–266. ­­ ​­​­​­ Connolly, J. F., Service, E., D’Arcy, R. C. N., Kujala, A., & Alho, K. (2001). Phonological aspects of word recognition as revealed by high-resolution spatio-temporal brain mapping. NeuroReport, 12(2), ­­ ​­​­​­ ­­ ​­​­​­ ­­ 237–243. ­­ ​­​­​­ de Lafuente, V., & Romo, R. (2004). Language abilities of motor cortex. Neuron, 41(2), ­­ 178–180. ­­ ​­​­​­ Deen, B., Koldewyn, K., Kanwisher, N., & Saxe, R. (2015). Functional organization of social perception and cognition in the superior temporal sulcus. Cerebral Cortex (New York, N.Y. : 1991), 25(11), 4596–4609. ­­ ­­ ​­​­​­ Dikker, S., Rabagliati, H., & Pylkkänen, L. (2009). Sensitivity to syntax in visual cortex. Cognition, 110(3), ­­ 293–321. ­­ ​­​­​­ Duncan, J. (2001). An adaptive coding model of neural function in prefrontal cortex. Nature Reviews Neuroscience, 2(11), 820–829. ­­ ­­ ​­​­​­

310

Analysing language using brain imaging Emmorey, K., Midgley, K. J., Kohen, C. B., Sehyr, Z. S., & Holcomb, P. J. (2017). The N170 ERP component differs in laterality, distribution, and association with continuous reading measures for deaf and hearing readers. Neuropsychologia, 106, 298–309. ­­ ​­​­​­ Giraud, A. L., & Poeppel, D. (2012). Cortical oscillations and speech processing: Emerging computational ­­ 511–517. ­­ ​­​­​­ principles and operations. Nature Neuroscience, 15(4), Grabot, L., Kösem, A., Azizi, L., & van Wassenhove, V. (2017). Prestimulus alpha oscillations and the tempo­­ 1566–1582. ­­ ​­​­​­ ral sequencing of audiovisual events. Journal of Cognitive Neuroscience, 29(9), Hagmann, P., Jonasson, L., Maeder, P., Thiran, J.-P., Wedeen, V. J., & Meuli, R. (2006). Understanding diffusion MR imaging techniques: From scalar diffusion-weighted imaging to diffusion tensor imaging and beyond. Radiographics: A Review Publication of the Radiological Society of North America, Inc, 26 Suppl 1, S205–223. ­­ ​­​­​­ Hald, L. A., Bastiaansen, M. C. M., & Hagoort, P. (2006). EEG theta and gamma responses to semantic violations in online sentence processing. Brain and Language, 96(1), ­­ 90–105. ­­ ​­​­​­ Hale, J. T., Campanelli, L., Li, J., Bhattasali, S., Pallier, C., & Brennan, J. R. (2022). Neurocomputational ­­ 427–446. ­­ ​­​­​­ models of language processing. Annual Review of Linguistics, 8(1), Jensen, M., Hyder, R., Westner, B.U., Højlund, A. and Shtyrov, Y. (2021). Neural language processing across time, space, frequency and age: MEG-MVPA classification of intertrial phase coherence. bioRxiv, DOI: 10.1101/2021.10.02.462796. Keller, S. S., Crow, T., Foundas, A., Amunts, K., & Roberts, N. (2009). Broca’s area: Nomenclature, anatomy, typology and asymmetry. Brain and Language, 109(1), ­­ 29–48. ­­ ​­​­​­ Koechlin, E., & Jubault, T. (2006). Broca’s area and the hierarchical organization of human behavior. Neuron, ­­ 963–974. ­­ ​­​­​­ 50(6), Kutas, M., & Federmeier, K. D. (2011). Thirty years and counting: Finding meaning in the N400 component of the event related brain potential (ERP). Annual Review of Psychology, 62, 621–647. ­­ ​­​­​­ Le Bihan, D., Mangin, J.-F., Poupon, C., Clark, C. A., Pappata, S., Molko, N., & Chabriat, H. (2001). Diffu­­ 534–546. ­­ ​­​­​­ sion tensor imaging: Concepts and applications. Journal of Magnetic Resonance Imaging, 13(4), Lee, H., Devlin, J. T., Shakeshaft, C., Stewart, L. H., Brennan, A., Glensman, J., Pitcher, K., Crinion, J., Mechelli, A., Frackowiak, R. S. J., Green, D. W., & Price, C. J. (2007). Anatomical traces of vocabulary acquisition in the adolescent brain. Journal of Neuroscience, 27(5), ­­ 1184–1189. ­­ ​­​­​­ Liu, M., Lerma-Usabiaga, ­­ ​­​­​­ G., Clascá, F.,  & Paz-Alonso, ­­ ​­​­​­ P. M. (2022). ­­ Reproducible protocol to obtain and measure first-order relay human thalamic white-matter tracts. Neuroimage, 262, 119558. Luck, S. J. (2014). An Introduction to the Event-Related ­­ ​­​­​­ Potential Technique (2nd ed). MIT Press. Luo, H., & Poeppel, D. (2007). Phase patterns of neuronal responses reliably discriminate speech in human ­­ 1001–1010. ­­ ​­​­​­ auditory cortex. Neuron, 54(6), Maurer, U., Brem, S., Bucher, K., & Brandeis, D. (2005). Emerging neurophysiological specialization for letter strings. Journal of Cognitive Neuroscience, 17(10), ­­ 1532–1552. ­­ ​­​­​­ Mechelli, A., Crinion, J. T., Noppeney, U., O’Doherty, J., Ashburner, J., Frackowiak, R. S., & Price, C. J. ­­ ­­ ​­​­​­ (2004). Structural plasticity in the bilingual brain. Nature, 431(7010), 757–757. Mechelli, A., Price, C. J., Friston, K. J., & Ashburner, J. (2005). Voxel-based morphometry of the human ­­ 105–113. ­­ ​­​­​­ brain: Methods and applications. Current Medical Imaging Reviews, 1(2), Nakai, T., Matsuo, K., Ohgami, Y., Oishi, K., & Kato, C. (2005). An fMRI study of temporal sequencing of motor regulation guided by an auditory cue—A comparison with visual guidance. Cognitive Processing, 6(2), ­­ 128–135 ­­ ​­​­​­ Newman, R. L., & Connolly, J. F. (2009). Electrophysiological markers of pre-lexical speech processing: Evidence for bottom–up ­­ ​­​­​­ and ­­top–down ​­​­​­ effects on spoken word processing. Biological Psychology, 80(1), ­­ 114–121. ­­ ​­​­​­ Novais-Santos, S., Gee, J., Shah, M., Troiani, V., Work, M., & Grossman, M. (2007). Resolving sentence ambiguity with planning and working memory resources: Evidence from fMRI. NeuroImage, 37(1), ­­ 361–378. ­­ ​­​­​­ Obermeyer, Z., Powers, B., Vogeli, C., & Mullainathan, S. (2019). Dissecting racial bias in an algorithm used to manage the health of populations. Science, 366(6464), 447–453. ­­ ­­ ​­​­​­ Osterhout, L., & Mobley, L. A. (1995). Event-related brain potentials elicited by failure to agree. Journal of Memory and Language, 34(6), ­­ 739–773. ­­ ​­​­​­ Pinti, P., Tachtsidis, I., Hamilton, A., Hirsch, J., Aichelburg, C., Gilbert, S., & Burgess, P. W. (2020). The present and future use of functional near-infrared spectroscopy (fNIRS) for cognitive neuroscience. Annals of the New York Academy of Sciences, 1464(1), ­­ 5–29. ­­ ​­​­​­ Poeppel, D. (2011). Genetics and language: A neurobiological perspective on the missing link (-ing hypotheses). Journal of Neurodevelopmental Disorders, 3(4), ­­ 381–387. ­­ ​­​­​­

311

Trisha Thomas et al. Richardson, F. M., & Price, C. J. (2009). Structural MRI studies of language function in the undamaged brain. ­­ ­­511–523. ​­​­​­ Brain Structure and Function, 213(6), Saur, D., Kreher, B. W., Schnell, S., Kümmerer, D., Kellmeyer, P., Vry, M.-S., Umarova, R., Musso, M., Glauche, V., Abel, S., Huber, W., Rijntjes, M., Hennig, J., & Weiller, C. (2008). Ventral and dorsal path­­ ­­18035–18040. ​­​­​­ ways for language. Proceedings of the National Academy of Sciences, 105(46), Saxe, R., & Kanwisher, N. (2004). People thinking about thinking people: The role of the temporo-parietal ­­­  ­­171–182). ​­​­​­ junction in “theory of mind”. In G. Bernston & J. Cacioppo (Eds.), Social Neuroscience (pp. Psychology Press. Scholkmann, F., Kleiser, S., Metz, A. J., Zimmermann, R., Mata Pavia, J., Wolf, U., & Wolf, M. (2014). A review on continuous wave functional near-infrared spectroscopy and imaging instrumentation and methodology. NeuroImage, 85, ­­6–27. ​­​­​­ Sliwinska, M. W., Khadilkar, M., Campbell-Ratcliffe, J., Quevenco, F., & Devlin, J. T. (2012). Early and sustained supramarginal gyrus contributions to phonological processing. Frontiers in Psychology, 3, 161. Smits, M., Jiskoot, L. C., & Papma, J. M. (2014). White matter tracts of speech and language. Seminars in ultrasound, CT, and MR, 35(5), ­­ ­­504–516. ​­​­​­ Sur, S.  & Sinha, V. K. (2009). ­­ ­­Event-related ​­​­​­ potential: An overview. Industrial Psychiatry Journal, 18(1), ­­ ­­70–73. ​­​­​­ Tettamanti, M., & Weniger, D. (2006). Broca’s area: A supramodal hierarchical processor?. Cortex; A Journal Devoted to the Study of the Nervous System and Behavior, 42(4), ­­ ­­491–494. ​­​­​­ Timmann, D., & Daum, I. (2007). ‘Cerebellar contributions to cognitive functions: A progress report after two ­­ ­­159–162. ​­​­​­ decades of research’. Cerebellum, 6(3), Towle, V. L., Yoon, H. -A., Castelle, M., Edgar, J. C., Biassou, N. M., Frim, D. M., Spire, J.-P., & Kohrman, M. H. (2008). ECoG gamma activity during a language task: Differentiating expressive and receptive speech areas. Brain, 131(8), ­­ ­­2013–2027. ​­​­​­ Uludağ, K., & Roebroeck, A. (2014). General overview on the merits of multimodal neuroimaging data fu​­​­​­ sion. NeuroImage, 102, ­­3–10. Wandell, B. A. (2016). Clarifying human white matter. Annual Review of Neuroscience, 39, ­­103–128. ​­​­​­ Warrier, C., Wong, P., Penhune, V., Zatorre, R., Parrish, T., Abrams, D., & Kraus, N. (2009). Relating structure to function: Heschl’s gyrus and acoustic processing. The Journal of Neuroscience: The Official Journal of ­­ ­­61–69. ​­​­​­ the Society for Neuroscience, 29(1), Wernicke, C. (1874). ­­ Der aphasische Symptomenkomplex. Cohn and Weigert.

312

20 NEW DIRECTIONS IN STATISTICAL ANALYSIS FOR EXPERIMENTAL LINGUISTICS Shravan Vasishth

Null hypothesis significance testing has been psychology’s hammer that made cognition and behaviour look like a nail. (Blokpoel & van Rooij, 2021)

20.1

Introduction

In recent decades, linguistics has taken an empirical turn: It is now routine to run experiments to test theoretical questions in areas like syntax (e.g., Sprouse et al., 2012), semantics (e.g., Hackl et al., 2012), and pragmatics (e.g., Chemla, 2009). Even those linguists who used to rely on intuition to develop their theories are now quite well versed in conducting planned experiments using well-executed experiment designs and sophisticated equipment. As far as data analysis goes, linguistics has historically looked to standard practice in psychology to develop the methodology of statistical inference. Unfortunately, the form of statistical inference that is the norm in psychology, and now in linguistics, has drifted far from its original intent (e.g., Belia et al., 2005). Even psychology textbooks (written by psychologists) propagate a fundamentally incorrect understanding of statistical inference (e.g., Cassidy et al., 2019). As a consequence of such misunderstandings, the statistical inferences that are often reported in published papers end up being at best questionable, if not outright wrong. We see the misinterpretations of statistical inference playing out in psychology through the replication crisis, whereby an alarmingly high proportion of claimed effects could not be replicated (Nosek et al., 2022; Open Science Collaboration, 2015). Similar problems occur in psycholinguistics (e.g., Vasishth et al., 2018). One major reason for these misinterpretations is that researchers (including the author of this chapter, when he was a graduate student) often receive only a cursory education in statistical inference, and dive into the nitty-gritty work of data analysis without really knowing what a p-value ​­ is or what it can and cannot tell us (e.g., Wasserstein & Lazar, 2016). Why, despite the focus on empirical methods in linguistics, is statistics education neglected? Through conversations with fellow linguists, it appears that statistics education has received second-class citizen status in linguistics largely because it is considered to be orthogonal to the

313

DOI: 10.4324/9781003392972-23

Shravan Vasishth

scientific process itself. However, this is a misunderstanding: As far as experimental linguistics goes, scientific reasoning and statistical inference are tightly connected, and neglecting the latter is likely to lead to invalid scientific conclusions. In this chapter, a practical example from a psycholinguistic data set (Nicenboim et al., 2018) will be presented to briefly discuss some of the most serious problems with standard null hypothesis— significance testing, and then an alternative approach to data analysis will be presented that uses Bayesian methods to shift the focus towards uncertainty quantification. One consequence of using Bayesian methods—assuming that they are used as intended—is that statistical inferences will generally be more conservative. For related discussions, see Vasishth (2023) and Vasishth and Gelman (2021). One caveat here is that it is of course possible to misuse Bayesian methods as well (e.g., Tendeiro et al., 2022). However, one important advantage of Bayesian approaches is that one can directly focus on uncertainty quantification, as explained below. This chapter assumes that the reader has some familiarity with experiment design, in particular with repeated measures designs, and have at least a passing acquaintance with standard statistical methods, such as t-tests and the linear mixed model (e.g., Winter, 2019). If the reader is lacking this background, some material specifically written for (psycho)linguists that will help the reader to acquire the assumed background (e.g., Baayen, 2008; Baayen et al., 2008; Vasishth, 2023; Vasishth & Gelman, 2021; Vasishth & Nicenboim, 2016; Vasishth, Schad et al., 2022). Reproducible code and data related to this chapter are available at https://osf.io/kgxpn/.

20.2 A conventional frequentist data analysis using statistical significance (what could go wrong?) As a case study, consider the self-paced reading study reported in Nicenboim et al. (2018). Although the published paper did not use the frequentist approach for data analysis used here, the frequentist approach is used here to illustrate the problems that null hypothesis significance testing leads to. This experiment investigates a phenomenon called similarity-based interference (Lewis & Vasishth, 2005). The essential claim being tested is that when a subject-verb dependency has to be built, the presence of nouns similar to the grammatical subject, but not in subject position, can make it harder to complete the correct dependency. The experiment design in Nicenboim et al. (2018) involves German and consists of two conditions. There is a low-interference condition (1a), where only the subject noun, Der Wohlt¨ater, “the philanthropist,” has the same number marking as the auxiliary verb hatte, “had,” and a high-interference condition (1b), where there are three nouns with the same number marking. In the high-interference condition, it is harder to distinguish the grammatical subject from the other two (distractor) nouns, and as a consequence, completing the subject-verb dependency takes more time. 1­ a Low interference Der Wohlt¨ater, der die Assistenten The.sg.nom philanthropist, who.sg.nom the.pl.acc assistant(s) der Direktoren begru¨ßt hatte, saß sp¨ater (of) the.pl.gen director(s) greeted had. sg, sat.sg later im Spendenausschuss. in the donations committee. “The philanthropist, who had greeted the assistants of the directors, sat later in the donations committee.” b High interference Der Wohlt¨ater, der den Assistenten The.sg.nom philanthropist, who.sg.nom the.sg.acc assistant des Direktors begru¨ßt hatte, saß sp¨ater (of) the.sg.gen director greeted had.sg, sat. sg later im Spendenausschuss. in the donations committee. “The philanthropist, who had greeted the assistant of the director, sat later in the donations committee.” 314

Directions in statistical analysis for experimental linguistics

Nicenboim et al. (2018) carried out a self-paced reading experiment with 184 German nativespeaker participants, each of whom saw a total of 60 items in a standard Latin square design. A typical data analysis carried out in such a design would be to isolate the reading time for the critical region (the auxiliary verb hatte, “had”), aggregate the data by subjects, and then carry out a ­one-sample ​­ (equivalently, ­ a paired) t-test. A repeated measure ANOVA (analysis of variance) would be equivalent to the t-test. ​­ One could also do a so-called by-items analysis, but this is omitted here for simplicity. In this example, standard statistical practice is followed for positive-only dependent measures like reading time by log-transforming reading time. In this particular example, a reciprocal transform would be more appropriate (e.g., Box & Cox, 1964; Kliegl et al., 2010), but for current purposes, this is not an important issue, so the log transform is used. Such a t-test yields a statistically significant effect of interference: The t-value is 2 (with degrees of freedom 183), the p-value is 0.05. The estimate of the effect (on the log ms scale) is 0.0198 and 95% confidence interval is [0,0.039]. The conventional conclusion would be that we found evidence for similarity-based interference. This conclusion is actually over-enthusiastic, as explained below. At this juncture, one might object that the t-test is not appropriate for these data. In fact, in recent years, t-tests and repeated measures ANOVA have increasingly been replaced by the linear mixed model (Bates et al., 2015; Pinheiro & Bates, 2000). It is correct that the linear mixed model is the better way to do the statistical analysis in the present case because all sources of variability can simultaneously be taken into account (cf. Clark, 1973). Doing such an analysis using the lme4 package in R yields an estimate of 0.02, with a 95% confidence interval of [−0.0016,0.0409], and a t-value of 1.85. Incidentally, when faced with such a null result, a common conclusion one sees in articles is to state that there is evidence against the interference effect. This conclusion is also a complete misunderstanding of the hypothesis testing framework: First, absence of evidence is not necessarily evidence of absence, and second, as discussed below, the difference between a significant and non-significant result is itself not necessarily significant (Gelman & Hill, 2007). Coming back to our example analyses above, is the result significant or not significant? The t-test suggests the answer is yes, the linear mixed model says no. There are two important observations here that need to be understood in order to interpret these apparently divergent results.

20.2.1 What could go wrong in frequentist hypothesis tests? 20.2.1.1

Statistical significance itself is not particularly informative

The first observation is that the difference between significant and not significant may itself not be significant (Gelman & Hill, 2007). To make this point concrete, we can do a short thought experiment. Suppose that we were to run the above experiment twice, with the same number of subjects ­ = 184) but different subjects in each run. Suppose that in the first run, we obtain a each time (n t-value of 2.05 with an estimated difference in sample mean of 0.02 log ms, with a standard deviation of 0.1323 (the t-value is computed using the formula: 0.02 − 0 / 0.1323 / 184 = 2.05).

(

)

(

)

Then we run the experiment again, and this time the estimate happens to be 0.01 and the standard deviation happens to be 0.14 (this can happen because of random variability). Now, the t-value is ​­ 1.02, which is not significant. Is the difference between these two results significantly different? We see many instances of papers in psycholinguistics and related areas (including one by the author of the present chapter) where researchers conclude that the answer is yes, the two results show 315

Shravan Vasishth

meaningful differences (­e.g., Levy & Keller, 2013; Nieuwenhuis et al., 2011; Vasishth & Lewis, 2006). But a ­two-​­sample t-​­test can answer that question formally. The difference in estimates is 0.02−0.01=0.01, the t-​­value from the two studies combined is computed using the formula:

observed − t =

0.01 − 0 2

0.1323 / 184 + 0.142 / 184

= 0.7 

(­1)

The t-​­value 0.7 tells us that we don’t have a significant difference between the two studies (­of course, this does not mean that there is no d­ ifference—​­we just don’t know). And yet, even experienced scientists (­e.g., ­editors-­​­­in-​­chief of major journals in psychology and psycholinguistics) will consider the second study a replication failure, and more generally will interpret the significant versus ­non-​­significant result as pointing to different conclusions. The main point here is that obtaining a significant or ­non-​­significant result by itself is not necessarily going to allow us to make a discovery claim. Doing statistical analyses on experimental data gives the illusion of quantitative rigor. This is why some psycholinguists have started demanding that linguistics always rely on experimental data (­Gibson & Fedorenko, 2010, 2013). But in fact, the knowledge gleaned from quantitative methods can be very tenuous, and even strong advocates of quantitative methods rarely appreciate this point. One major problem here is statistical power. This is discussed next.

20.2.1.2  Underpowered studies will be misleading, and studies are often (­severely) underpowered The second observation is that far more important than a significant or ­non-​­significant result is the extent to which the experiment design might overestimate the true effect under repeated sampling (­Gelman & Carlin, 2014). To understand this point, one must understand the concept of statistical power. Power is the probability of detecting an effect if it actually exists (­has some particular magnitude). Null hypothesis significance testing works as intended when it is used in ­high-​­power situations, but it is likely to lead to misleading results in l­ ow-​­power situations. In the present case, even though this experiment was run with 184 subjects (­which seems like a lot of subjects), the power of the design is relatively low. To see this, imagine that we use the estimates from the above design to plan a new experiment. How many participants would we need to achieve 80% power (­the power level recommended by the American Psychological Association)? Assuming that the effect estimate on the log ms scale is indeed 0.0198, and that the standard deviation (­estimated from the data) is 0.1323 log ms, a power calculation shows that the necessary sample size would be 352 participants. Incidentally, we are not computing “­observed power” here but rather prospective power. Many researchers incorrectly try to determine the power of an a­ lready-​­conducted experiment, referring to this as “­observed power.” However, as Hoenig and Heisey (­2001) show, observed power is just a transform of the p-​­value and adds no new information about the current study. The only relevant use of power calculations is to plan a future e­ xperiment—​­prospective power. What are the consequences of running a study with low power? One important consequence is that a statistically significant effect is likely to be based on an overestimate or may even have the wrong sign. Gelman and Carlin (­2014) call this kind of misestimate Type M(­agnitude)/­Type S(­ign) error.

316

Directions in statistical analysis for experimental linguistics

Misestimation of the effect estimate under low power is one key reason why a statistically significant result such as the one we obtained above with the t-test is not especially informative. It is also unlikely to be replicable if we define replicability as repeatedly finding significant effects when re-running the experiment. In a replication attempt, even if we are lucky enough to detect the effect (through a significant result), the significant effect would again very likely be based on a misestimate. One can demonstrate this through a simulation. Suppose that the true effect estimate is 0.0198 log ms (so, the null hypothesis of no effect is false), and the standard deviation is 0.1323 log ms. If we repeatedly generate data with a typical sample size of 30 participants (Jäger et al., 2017) from the assumed normal distribution (a normal distribution is what the statistical test assumes), this yields a power of about 13%. We will find that on average, the significant effects will be based on an estimate (with the correct positive sign) that is about three times as large as the assumed true effect of 0.0198 log ms. It is common in linguistics to run studies with relatively low power (Bürki et al., 2020; Jäger et al., 2017; Jäger et al., 2020; Nicenboim et al., 2020; Vasishth, 2023; Vasishth et al., 2018). This is not due to any malicious intent, but due to resource and time limitations that researchers are often faced with. Low power is not a problem that is easy to solve, but what we can change is to move away from the focus on significant/non-significant results (which, as shown above, will be uninformative). But if we don’t focus on significance, what should we focus on? This is discussed next.

20.2.2  A proposal from psychology to use frequentist confidence intervals  instead of p-values, and the problem with this proposal One important question we should consider given the above set of analyses (using the t-test ​­ and then the linear mixed model) is the following: We know what is different between the two analyses, but what is common to both? What is common to the two statistical tests is that 95% confidence intervals are both showing similar values: The paired t-test shows [0,0.039], the linear mixed model [−0.002, 0.041]. The linear mixed model estimate is wider because it includes more variance components than the t-test (which artificially reduces sources of variance through aggregation; see Schad, Nicenboim, and Vasishth [2022]). Incidentally, the reader might again be tempted to conclude that the confidence intervals are showing different things, but this is the same mistake as treating significant versus non-significant results as always being meaningful (they can be meaningful, but only when power is high). Many researchers (e.g., Cumming, 2014; McShane et al., 2019; Meehl, 1997) have suggested that one should move away from statistical significance and focus instead on estimating and reporting confidence intervals. More generally, these researchers have argued that one should focus on quantifying one’s uncertainty of the effect estimate. Usually, the frequentist confidence interval is used as a way to quantify this uncertainty. Under this view, one could just report the confidence interval (here, the estimate from the linear mixed effects model is used): [−0.002, 0.041] log ms. Instead of saying that the effect was significant or not significant, we can just say: The observed effect is 0.02 log ms, with 95% CI [−0.0016, 0.0409]. This is consistent with the pattern predicted by the theory being investigated. There are many advantages to such an approach: For one thing, once enough data accumulates, one can carry out a meta-analysis (e.g., Bürki et al., 2020, 2022; Jäger et al., 2017; Nicenboim et al., 2020), which allows us to quantitatively assert (modulo publication bias) what we have

317

Shravan Vasishth

learned from existing studies. Another advantage is that other researchers can use the published results to plan a properly powered study. One technical problem with the confidence interval is that it doesn’t quantify uncertainty about the effect estimate but has a rather convoluted meaning which is practically useless: If one were (counterfactually) to run the experiment again and again, and compute 95% confidence intervals each time, 95% of those repeatedly computed, hypothetical intervals would contain the true mean. This is practically useless because we have only one confidence interval to work with and it either contains the true effect or it doesn’t, but we just don’t know which of these two possibilities is true! It is mathematically incorrect to treat the frequentist confidence interval as specifying the range over which we can be 95% certain that the true effect lies. The reason is that the effect is an unknown point value and, therefore, has no probability density function associated with it. As a reminder, a probability density function (PDF) is a function (such as the normal distribution) which describes how the continuous numerical values (such as data) are distributed. For example, reading time data in milliseconds could be assumed to come from a LogNormal PDF, written ­ σ); such a distribution would define the range of plausible values that the data can LogNormal (µ, have. In the case of the LogNormal, negative values cannot be generated. In Bayesian statistics, parameters also have such PDFs associated with them. For more details, see, for example, Kerns (2010). Thus, if the effect is represented as the parameter β (in a linear mixed model, this would be a slope in the fixed effects part of the model), we cannot work out the values “lower” and “upper” such that the probability that β lies within these intervals is 0.95 (Prob( lower < β < upper) = 0.95). ­ To compute such a probability, we would have to assign a PDF to β. For example, β would need to have a distribution like the normal distribution. As mentioned above, in the frequentist paradigm, the effect is just an unknown point value “out there in nature.” It simply cannot have a probability distribution. It seems that this point has escaped even those psychologists (e.g., Meehl, 1997) who argue against p-values as a way to carry out inference (also see Hoekstra et al., 2014). Figure 20.1 visualizes the coverage properties of the confidence interval in 100 simulations. By coverage we mean here the proportion of cases where the true µ is contained in the CI. The data are repeatedly generated from a normal distribution with mean 500 and standard deviation 100. Each confidence interval either contains the true mean 500 or it doesn’t. The 95% refers to the probability that the 100 confidence intervals contain the true mean. So is there some way to quantity uncertainty about the effect? It turns out that this is possible if we switch to a Bayesian way of thinking. This point is explained next. Until recently, Bayesian methods were very inaccessible to the non-statistician. One reason for this was that sufficiently flexible software did not exist, and complex models were difficult to fit. This situation has changed completely over the last ten years, and now software like Stan (Carpenter et al., 2017) and JAGS (Plummer, 2012) have made it possible to fit relatively complex models quite easily. Moreover, several accessible textbooks, designed for the experimentalist who is not a statistician, have become available (e.g., Kruschke, 2014; McElreath, 2020; Nicenboim et al., 2022). Because of these developments, it is now relatively easy to switch to a Bayesian methodology.

20.3 An alternative approach: Uncertainty quantification through Bayesian estimation In the frequentist approach, the difference in means between two conditions is assumed to be an unknown point value. In our running example, the difference in means between the high- and 318

Directions in statistical analysis for experimental linguistics

500 460

480

y

520

540

95% CIs in 100 repeated samples

0

20

40

60

80

100

i−th repeated sample

­Figure  20.1

Illustration of the meaning of a 95% confidence interval (CI). The thicker bars represent the CIs which do not contain the true mean.

­ ​­ low-interference conditions, call it δ, is estimated by computing the difference in sample means between the two conditions. This is the maximum likelihood estimate. After that, the statistical test ­ t-test) is carried out by dividing d, the estimate of δ, by the estimated standard error. The 95% (the ​­ ­ (which is approximately confidence interval is then d ± tcrit × SE, where tcrit is the critical t-value two for sample sizes larger than 20). The standard error only tells us how variable the estimate of the difference in sample means would be under (hypothetical) repeated sampling. The standard error cannot tell us anything about the uncertainty of the effect itself, as the effect δ is a point value by assumption. The effect has no distribution and, therefore, no uncertainty associated with it. By contrast, the Bayesian approach assumes that the true difference in means, δ, has a PDF associated with it. This is called a prior distribution and represents our prior belief or prior knowledge about this difference. For example, we could define a prior distribution to δ as follows: δ ~ Normal ( µδ, σδ)

(2) ­

The prior distribution has so-called hyperparameters µδ and σδ, which specify the mean and spread of the distribution. What the above prior means is that, a priori, there is a 95% probability of the parameter lying between µδ − 1.96 × σδ and µδ + 1.96 × σδ. Defining such a prior distribution is quite a radical shift from the frequentist approach because, for the first time, we can talk about our prior uncertainty or prior belief about the effect of interest. A reasonable question that naturally arises at this point is: How can one come up with a prior distribution on the effect of interest even before running an experiment? Coming up with priors 319

Shravan Vasishth

requires a way of thinking that physicists call a Fermi problem (Von Baeyer, 1988). It is usually possible to work out reasonable priors for the parameters associated with a particular research problem. Formal method for deriving priors is a well-developed field (Morris et al., 2014; Oakley & O’Hagan, 2010; O’Hagan et al., 2006). Actually, linguists are already familiar with prior elicitation: Any linguist who has used intuition-based judgments to decide on the grammaticality of a sentence is solving a Fermi problem. Intuition-based judgments are basically expressing a prior belief about a sentence before any experimental data are seen. For examples, from psycholinguistics of how priors can be systematically worked out from published papers or by reasoning about the research problem at hand, see chapter 6 of Nicenboim et al. (2022). Once we analyse the data in the Bayesian framework, what we obtain is the updated distribution of δ. This is called the posterior distribution of δ. “Posterior” here refers to the updated PDF of the parameter after the PDF has been informed by the data. The basic approach is as follows. Suppose that the data are represented by the vector y. Then, the posterior distribution is the distribution of δ given y: f (δ| y). The posterior distribution is computed using Bayes’ rule, which states that the posterior distribution is proportional to the product of the prior distribution and the likelihood. To make this concrete, if the prior on δ is f (δ) = Normal ( µδ, σδ) and the data are assumed to be generated from some likelihood function that takes δ as a parameter (call this likelihood f (y|δ)), then, following Bayes’ rule, the posterior distribution is proportional to the product of the likelihood and the prior: f (δ | y ) ∝ f ( y | δ ) f (δ )

­ (3)

If there is more than one parameter in the model, then a prior is defined for each parameter, and the posterior is then the joint distribution of the parameters given the data. For example, if the likelihood is the normal distribution, the parameters are the mean µ and the standard deviation σ ­ (instead of δ), and the posterior distributions of these parameters are derived by computing: f (µ, σ | y) ∝ Normal ( y | µ, σ) f (µ) f (σ)

­ (4)

Here, f (µ) and f (σ) are prior distributions. The data analysis below of the running example in this chapter will make these priors clearer. Returning to our one-parameter example involving δ above, the mean of the posterior distribution f (δ|y) is a compromise between the frequentist maximum likelihood estimate and the mean of the prior distribution. This is a very important difference from the frequentist approach and allows us to build on prior knowledge. For a practical example of an analysis building on prior knowledge, see Vasishth and Engelmann (2022). Another important consequence of this fact (that the posterior mean is a compromise between the prior mean and the maximum likelihood estimate) is that priors serve to regularize the posterior: When the data are sparse and a parameter cannot be estimated accurately, the posterior mean will be close to the prior mean. This regularization function of priors has the effect that the convergence warnings that one often sees in the lmer function in the lme4 package will not occur (assuming that a regularizing prior is defined). For more discussion, see chapter 5 of Nicenboim et al. (2022).

320

Directions in statistical analysis for experimental linguistics

Usually, in complex linear mixed models, this posterior distribution is computed using Markov Chain Monte Carlo (MCMC) sampling. To carry out this computation, one uses software such as Stan (Carpenter et al., 2017) or its front-end brms (Bürkner, 2017), JAGS (Plummer, 2012), or the like. In Bayesian analysis, a radical change from the frequentist approach is that the Bayesian approach allows us to directly talk about the uncertainty of the effect of interest (δ) once we have seen the data: The posterior distribution gives us this information. In other words, we can now say that we are 95% certain (given the statistical model and the data) that the effect lies between a lower and upper bound.

20.3.1

Bayesian estimation: A concrete example

To make this approach concrete, consider the Bayesian equivalent of the frequentist linear mixed model presented in the first part of this chapter. As a baseline, first consider the frequentist linear mixed model (Figure 20.2). The reader will be familiar with the following lme4 syntax. Here, logrt is a vector containing log-transformed reading times, int is the two-level factor, coded as ±0.5 (Schad, Vasishth et al., 2020). The model implied here is: y ∼ LogNormal (α + u1 + w1 + ( β + u2 + w2 ) × int ,σ )

(5) ­

where y is the reading time in milliseconds, u1, u2 and w1, w2 are, respectively, subject-level and item-level adjustments to the fixed effect intercept α and slope β, with both the u and w adjustments coming from bivariate normal distributions. For example, u1 is assumed to have a normal distribution with mean 0 and standard deviation σu1, u2 a normal distribution with mean 0 and standard deviation σu2, and the correlation between u1 and u2 is ρu. The bivariate normal distribution is written like this:  u  1  u2

  σ u2   0   1  ∼ Normal2   ,   0   ρσ σ   u u1 u2 

      

ρu σ u σ u 1

2

σ u2

2

­ (6)

The subscript 2 in the PDF Normal2 (. . .) above states that the distribution is bivariate (it has two variables, u1 and u2). Similarly, the by-item adjustments to the intercept and slope, w1 and w2, also have a bivariate normal distribution defined for them:  w 1   w2

  σ 2w   0   1  ∼ Normal   ,     0   ρwσ w1 σ w2  

ρwσ w σ w 1

σ 2w

2

2

     

(­7)

This implies that the frequentist model has the following parameters: The fixed effects α, β, and the variance components σ u1 , σ u2 , σ w1 , σ w2 , ρu , ρw , σ .

321

Shravan Vasishth

­Figure  20.2

A typical frequentist analysis of the example data set discussed in the paper.

In the Bayesian version of this model, we will need to define prior distributions for each of these parameters. The prior distributions for all parameters except the correlations are on the log scale: α β σ σu

1,2

σw

1,2

ρu ρw

( ) ∼ Normal ( 0,0.1) ∼ Normal+ ( 0,0.5) ∼ Normal+ ( 0,0.1) ∼ Normal+ ( 0,0.1) ∼ LKJ ( 2 ) ∼ LKJ ( 2 ) ∼ Normal 6,0.6



­ (8)

Why are these priors and not some others? The motivation for these priors is discussed in detail in Schad, Betancourt et al. (2020). Essentially, these priors for the parameters generate realistic data for the research problem we are considering here; this can be established by simulating data from the model even before the particular data we are analysing has been seen. 322

Directions in statistical analysis for experimental linguistics

eta = 2

density

2.5

2.0

1.5

1.0 −1.0

­Figure  20.3

−0.5

0.0

rho

0.5

1.0

Visualization of the LKJ prior with parameter 2. This is an example of a regularizing prior: Extreme values of the correlation value like ±2 are rendered impossible through this prior.

The priors for the correlation parameters need some discussion. For these correlations, the socalled LKJ prior is available in the Stan programming language. When the LKJ distribution gets the parameter 2, this specifies a prior that is widely spread out between −1 and +1 and has mean 0. See Figure 20.3 for a visualization. A great advantage of this prior on the correlation is that the mean of the posterior distribution of correlation cannot have extreme values like +1 or −1. Such extreme values are often seen in frequentist linear mixed models, and represent an estimation failure. Correlations estimated to be ±1 or very near these extreme values will lead to a so-called degenerate variance-covariance matrix (e.g., Pinheiro & Bates, 2000). This means that the matrix will be non-invertible, making the model ill-specified. In the probabilistic programming language, the LKJ prior prevents such extreme correlations from occurring because of the shape of the LKJ(2) distribution: These extreme values are heavily down-weighted. This is what is meant by regularization in Bayesian methods: A priori unlikely values are down-weighted by the prior. Leaving out the technical details of how the computation is done (see Nicenboim et al., 2022), the estimates of the parameters from the Bayesian linear model are shown in Table 20.1, with the frequentist estimates shown alongside. Some important similarities and differences between the frequentist versus Bayesian estimates: 1 The means of most of the parameters are very similar in both. 2 The mean of the correlation parameter for items, ρw, is smaller in the Bayesian model. This is an example of the posterior mean being a compromise between the prior mean (0) and the maximum likelihood estimate. The posterior mean of ρw is being regularized towards 0, because there is not enough data to estimate this parameter accurately. In other words, the frequentist estimate will be most likely an overestimate (Type M error). 3 The Bayesian model provides uncertainty intervals for each parameter. The lme4 function does not and cannot even in principle provide such uncertainty intervals, as the parameters are point values and have no distribution. The frequentist model allows us to work out the confidence intervals for the fixed effects, but these intervals are only telling us how variable the sample mean would be under hypothetical repeated sampling. They do not tell us the uncertainty about the true values of the parameters. 323

Shravan Vasishth ­Table  20.1 Shown are the frequentist and Bayesian estimates from linear mixed models fit in the frequentist and Bayesian linear mixed models. CrI refers to the Bayesian credible interval and represents the range over which we can be 95% certain that the true value of the parameter lies, given the model and data Parameter

Frequentist mean

CI

Bayesian mean

CrI

Α Β σu1 σu2 ρu σw1 σw2 ρw Σ

6.35 0.012 0.37 0.05 −0.04 0.04 0.03 −0.65 0.47

[6.3, 6.41] [−0.01, 0.03] –​­ –​­ –​­ –​­ –​­ –​­ –​­

6.35 0.02 0.36 0.04 −0.04 0.04 0.03 −0.42 0.47

[6.30, 6.40] [−0.00, 0.04] [0.32, 0.40] [0.00, 0.08] [−0.61, 0.54] [0.03, 0.05] [0.00, 0.06] [−0.97, 0.57] [0.47, 0.48]

On the log scale, the estimate of the effect is 0.019, with 95% credible interval [−0.001, 0.04] log ms. This estimate is not very different from the frequentist one computed using the lme4 package. However, the meaning of the credible interval is quite different from that of the confidence interval. The Bayesian model also allows us to back-transform the posterior distributions of the fixed effects parameters to the millisecond scale (see Nicenboim et al., 2022). For example, given a simple linear model like: log ( y ) = α + β × predictor 

(9) ­

one can exponentiate both sides to obtain the values on the original scale of the data y. The back-transformed values are easier to interpret because a computational model of interference effects makes predictions on the millisecond scale (Vasishth, 2020; Vasishth et al., 2019), and because meta-analysis estimates of the interference effect are on the millisecond scale (Jäger et al., 2017). Figure 20.4 shows this transformed effect estimate from the Bayesian model. What is interesting about this estimate is that we can now conclude that, given the model and data, the estimate of the interference effect is, with 95% certainty, between −0.6 and 22.85 ms, and the posterior distribution of δ has mean 11.15 ms. The uncertainty interval is called a credible interval. The meta-analysis estimate of this effect (Jäger et al., 2017), which is based on a large amount of published data, is 13 ms, 95% credible interval [2,28]. The observed credible interval in our example data is consistent with the meta-analysis estimate, in the sense that the uncertainty ranges are similar in both. The conclusion from the data in our running example would be that the estimate of the effect is consistent with the published estimate of the effect in the literature. This method of drawing inferences using credible intervals is called the region of practical equivalence approach (Freedman et al., 1984; Kruschke, 2014; Spiegelhalter et al., 1994). Notice that we are not saying here that the “interference effect is present” or that “the interference effect was reliable/statistically significant”; that would be a much stronger discovery claim. The conclusion that the observed posterior distribution of the effect of interest is consistent with the published data and/or theoretical predictions is different from claiming that we found evidence for the interference effect. Arguing that we have evidence for an effect requires a model comparison 324

Directions in statistical analysis for experimental linguistics

Posterior distribution of the interference effect (in ms)

0.06

density

0.04

0.02

0.00 0

­Figure  20.4

20

Interference effect (ms)

40

The estimate of the interference effect on the millisecond scale, based on a Bayesian linear mixed model.

using a likelihood ratio test (Royall, 1997). Evidence in Bayesian methods is discussed next, and answers the commonly asked question: How can we know that the effect is “reliable” or “real”? As explained in the first part of this chapter, statistical significance can only answer this question if statistical power is high. In the Bayesian framework, it is in principle possible to carry out a null hypothesis test to attempt to answer this question. This Bayesian test is called the Bayes factor. The Bayes factor is the analogue of the frequentist null hypothesis significance test. For authoritative discussions of the Bayes factor, see, for example, Lee and Wagenmakers (2014). The Bayes factor is a ratio that represents the weight of evidence for the effect of interest compared to some null model (such as a model assuming that the effect is 0). For example, a Bayes factor of 3 means that a model including a parameter representing the effect is three times more likely than a model assuming no effect at all. Some textbooks and articles (e.g., Lee & Wagenmakers, 2014) provide a scale for interpreting Bayes factors, but such scales are arbitrary. The Bayes factor comes at a price (Schad, Nicenboim, Bürkner et al., 2022), the principal one being that it can be very sensitive to the prior specified for the parameter representing the effect (Kass & Raftery, 1995). As a consequence, it becomes necessary to report a so-called sensitivity analysis: The Bayes factor is computed under a range of prior specifications for the parameter of interest in the model (in the linear mixed model, this would be the β parameter). Thus, unlike the p-value, a single Bayes factor is almost never informative. Moreover, in the context of experimental (psych)linguistics, the Bayes factor also suffers from the same power problem that we saw with the frequentist p-value. When power is low (e.g., with smaller sample sizes), the Bayes factor can deliver overly strong evidence for an effect (Vasishth, 325

Shravan Vasishth

Yadav et al., 2022). Further, the Bayes factor can also lead to inconclusive results; for example, a Bayes factor near 1 would be inconclusive. In our running example, the Bayes factor with a relatively constrained prior of Normal (0,0.1) (on the log scale) for the slope parameter β (this ­ represents the interference effect), the Bayes factor is 0.63 in favour of the effect, which is inconclusive. The prior Normal (0,0.1) is relatively constrained because it implies that the effect can range a priori from −115 to +115 on the ms scale. This back-transform from the log-ms scale to the ms scale is shown in the accompanying code. The sensitivity of the Bayes factor to the prior can be illustrated by recomputing the Bayes factor under a range of priors. For example, assume a much wider prior for the β parameter, e.g., Normal (0,1). This prior implies that, a priori, the effect can range from −1345 to +1345 ms. Such a prior is sometimes called an uninformative prior. Under such a prior, the Bayes factor is 0.06. 1 = 16.3 times more likely than a model asThis Bayes factor implies that the null hypothesis is 16.3 suming that the effect exists! This is an invalid conclusion, and is entirely driven by the a priori assumption that the effect can be in the high hundreds of milliseconds. In general, an uninformative prior will unduly favour the null hypothesis, leading to—as in this case—a misleading conclusion. Despite the limitations of the Bayes factor, when the research question really does boil down to whether the effect is present or absent, the Bayes factor is a good way to evaluate the evidence from the data and is definitely superior to the p-value because it can, in principle, provide evidence for the null (assuming that the study is properly powered). The main issue one must take care of with Bayes factors analyses is to carry out a sensitivity analysis. For more details on this point, see Nicenboim et al. (2022) and Schad, Nicenboim, Bürkner, et al. (2022). For an example of a sensitivity analysis in psycholinguistics, see Nicenboim et al. (2020). In summary, a major advantage of adopting the Bayesian approach in experimental linguistics is that uncertainty quantification of the effect of interest—an approach advocated for by prominent psychologists like Meehl (1997), becomes possible. There are of course many other advantages of adopting the Bayesian approach: For example, highly customized and complex models can be fit (Nicenboim et al., 2022). One final question worth addressing here is: How can one learn enough about Bayesian statistics to be able to use it sensibly in linguistics? Two textbooks accessible to linguists are McElreath (2020) and Kruschke (2014). We have also written a textbook, which is available for free online: Nicenboim et al. (2022). There is also a free online four-week course with video lectures recorded by the author on openhpi.de that covers the first four chapters of Nicenboim et al. (2022).

20.4

Reproducible code and data

The code and data accompanying this chapter are available from https://osf.io/kgxpn/.

References Baayen, R. H. (2008). Analyzing linguistic data: A practical introduction to statistics using R. Cambridge University Press. Baayen, R. H., Davidson, D. J., & Bates, D. M. (2008). Mixed-effects modeling with crossed random effects ­ 390–412. ­ ​­ for subjects and items. Journal of Memory and Language, 59 (4), Bates, D. M., Maechler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4. ­ ​­ Journal of Statistical Software, 67, 1–48. Belia, S., Fidler, F., Williams, J., & Cumming, G. (2005). Researchers misunderstand confidence intervals and standard error bars. Psychological Methods, 10 (4), 389.

326

Directions in statistical analysis for experimental linguistics Blokpoel, M., & van Rooij, I. (2021). Theoretical modeling for cognitive science and psychology [Retrieved December 4, 2022]. Box, G. E., & Cox, D. R. (1964). An analysis of transformations. Journal of the Royal Statistical Society. Series B (Methodological), 26(2), ­ ­211–252. ​­ Bürki, A., Alario, F.-X., & Vasishth, S. (2022). When words collide: Bayesian meta-analyses of distractor and target properties in the picture-word interference paradigm. Quarterly Journal of Experimental Psychology [Accepted]. Bürki, A., Elbuy, S., Madec, S., & Vasishth, S. (2020). What did we learn from forty years of research on ­ ​­ semantic interference? A Bayesian meta-analysis. Journal of Memory and Language, 114, 104125. https: ­ ­ ­ //doi.org/10.1016/j.jml.2020.104125 Bürkner, P.-C. (2017). brms: An R package for Bayesian multilevel models using Stan. Journal of Statistical Software, 80 (1), https://doi. org/10.18637/jss.v080.i01 ­ ­1–28. ​­ ­ ­ ­ Carpenter, B., Gelman, A., Hoffman, M. D., Lee, D., Goodrich, B., Betancourt, M., Brubaker, M., Guo, J., Li, P., & Riddell, A. (2017). Stan: A probabilistic programming language. Journal of Statistical Software, 76 (1), ­ 1–32. ­ ​­ Cassidy, S. A., Dimova, R., Gigu’ere, B., Spence, J. R., & Stanley, D. J. (2019). Failing grade: 89% of introduction-to-psychology textbooks that define or explain statistical significance do so incorrectly. ­ ­233–239. ​­ Advances in Methods and Practices in Psychological Science, 2 (3), Chemla, E. (2009). Presuppositions of quantified sentences: Experimental data. Natural language semantics, 17 (4), ­ 299–340. ­ ​­ Clark, H. (1973). The language-as-fixed-effect fallacy: A critique of language statistics in psychological research. Journal of Verbal Learning and Verbal Behavior, 12 (4), ­ ­335–359. ​­ Cumming, G. (2014). The new statistics: Why and how. Psychological Science, 25 (1), ­ ­7–29. ​­ Freedman, L. S., Lowe, D., & Macaskill, P. (1984). Stopping rules for clinical trials incorporating clinical ­ ­575–586. ​­ opinion. Biometrics, 40 (3), Gelman, A., & Carlin, J. B. (2014). Beyond power calculations: Assessing Type S (sign) and Type M ­ ­ ­641–651. ​­ (magnitude) errors. Perspectives on Psychological Science, 9 (6), Gelman, A., & Hill, J. (2007). Data analysis using regression and multi-level/hierarchical models. Cambridge University Press. Gibson, E., & Fedorenko, E. (2010). Weak quantitative standards in linguistics research. Trends in Cognitive ­ ­233–234. ​­ Sciences, 14 (6), Gibson, E., & Fedorenko, E. (2013). The need for quantitative methods in syntax and semantics research. ­­ ​­ 88–124. ­ ​­ Language and Cognitive Processes, 28 (1–2), Hackl, M., Koster-Hale, J., & Varvoutis, J. (2012). Quantification and acd: Evidence from real-time sentence ­ ­145–206. ​­ processing. Journal of Semantics, 29 (2), Hoekstra, R., Morey, R. D., Rouder, J., & Wagenmakers, E.-J. (2014). Robust misinterpretations of confi​­ dence intervals. Psychonomic Bulletin and Review, 21, ­1157–1164. Hoenig, J. M., & Heisey, D. M. (2001). The abuse of power: The pervasive fallacy of power calculations for ​­ data analysis. The American Statistician, 55, ­19–24. Jäger, L. A., Engelmann, F., & Vasishth, S. (2017). Similarity-based interference in sentence comprehension: Literature review and Bayesian meta-analysis. Journal of Memory and Language, 94, ­316–339. ​­ https: //­ doi.org/https://doi.org/10.1016/j.jml.2017.01.004 ­ ­ ­ ­ Jäger, L. A., Mertzen, D., Van Dyke, J. A., & Vasishth, S. (2020). Interference patterns in subject-verb agreement and reflexives revisited: A large-sample study. Journal of Memory and Language, 111. https://doi. ­ org/https://doi.org/10.1016/j.jml.2019.104063 ­ ­ ­ ­ Kass, R. E., & Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Association, 90 (430), ­ ­773–795. ​­ Kerns, G. J. (2010). Introduction to probability and statistics using R. https://www.atmos.albany.edu/facstaff/ ­ ­ ­ timm/ATM315spring14/R/IPSUR.pdf ­ ­ ­ Kliegl, R., Masson, M. E., & Richter, E. M. (2010). A linear mixed model analysis of masked repetition prim­ ­655–681. ​­ ing. Visual Cognition, 18 (5), Kruschke, J. (2014). Doing Bayesian data analysis: A tutorial with R, JAGS, and Stan. Academic Press. Lee, M. D., & Wagenmakers, E.-J. (2014). Bayesian cognitive modeling: A practical course. Cambridge University Press. Levy, R. P., & Keller, F. (2013). Expectation and locality effects in German verb-final structures. Journal of ­ ­199–222. ​­ Memory and Language, 68 (2),

327

Shravan Vasishth Lewis, R. L., & Vasishth, S. (2005). An activation-based model of sentence processing as skilled memory retrieval. Cognitive Science, 29, ­1–45. https://doi.org/10.1207/s15516709cog0000 25 ​­ ­ ­ ­ McElreath, R. (2020). Statistical rethinking: A Bayesian course with examples in R and Stan. CRC Press. McShane, B. B., Gal, D., Gelman, A., Robert, C., & Tackett, J. L. (2019). Abandon statistical significance. ­ ­235–245. ​­ The American Statistician, 73 (sup1), Meehl, P. E. (1997). The problem is epistemology, not statistics: Replace significance tests by confidence intervals and quantify accuracy of risky numerical predictions. In L. Harlow, S. Mulaik, & J. H. Steiger ­ (Eds.), What if there were no significance tests? Erlbaum. Morris, D. E., Oakley, J. E., & Crowe, J. A. (2014). A web-based tool for eliciting probability distributions from experts. Environmental Modelling & Software, 52, ­1–4. ​­ Nicenboim, B., Schad, D. J., & Vasishth, S. (2022). Introduction to Bayesian data analysis for cognitive science [Under contract with Chapman and Hall/CRC Statistics in the Social and Behavioral Sciences Series]. Nicenboim, B., Vasishth, S., Engelmann, F., & Suckow, K. (2018). Exploratory and confirmatory analyses in sentence processing: A case study of number interference in German. Cognitive Science, 42(4), ­ ­1075–1100. ​­ Nicenboim, B., Vasishth, S., & R¨osler, F. (2020). Are words pre-activated probabilistically during sentence comprehension? evidence from new data and a Bayesian random-effects meta-analysis using publicly available data. Neuropsychologia, 142. Nieuwenhuis, S., Forstmann, B. U., & Wagenmakers, E.-J. (2011). Erroneous analyses of interactions in neuroscience: A problem of significance. Nature Neuroscience, 14 (9), ­ 1105–1107. ­ ​­ Nosek, B. A., Hardwicke, T. E., Moshontz, H., Allard, A., Corker, K. S., Dreber, A., Fidler, F., Hilgard, J., Kline Struhl, M., Nuijten, M. B., et al. (2022). Replicability, robustness, and reproducibility in psychological science. Annual Review of Psychology, 73, 719–748. ­ ​­ Oakley, J. E., & O’Hagan, A. (2010). SHELF: The Sheffield elicitation framework (version 2.0). School of Mathematics and Statistics, University of Sheffield. University of Sheffield, UK. O’Hagan, A., Buck, C. E., Daneshkhah, A., Eiser, J. R., Garthwaite, P. H., Jenkinson, D. J., Oakley, J. E., & Rakow, T. (2006). Uncertain judgements: Eliciting experts’ probabilities. John Wiley & Sons. ­ Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349 (6251), aac4716. ­ Pinheiro, J. C., & Bates, D. M. (2000). Mixed-effects models in S and S-PLUS. Springer-Verlag. ­ ​­ ­ ​­ ­ ​­ Plummer, M. (2012). JAGS version 3.3.0 manual. International Agency for Research on Cancer. Lyon, France. Royall, R. (1997). Statistical evidence: A likelihood paradigm. Chapman; Hall, CRC Press. Schad, D. J., Betancourt, M., & Vasishth, S. (2020). Toward a principled Bayesian workflow in cognitive science. Psychological Methods, 26 (1), ­ ­103–126. ​­ https://doi.org/https://doi.org/10.1037/met0000275 ­ ­ ­ ­ ­ Schad, D. J., Nicenboim, B., Bürkner, P.-C., Betancourt, M., & Vasishth, S. (2022). Workflow techniques for ­ ­ ­ the robust use of Bayes factors. Psychological Methods. https://doi.org/10.1037/met0000472 Schad, D. J., Nicenboim, B., & Vasishth, S. (2022). Data aggregation can lead to biased inferences in Bayesian linear mixed models. Schad, D. J., Vasishth, S., Hohenstein, S., & Kliegl, R. (2020). How to capitalize on a priori contrasts in linear (mixed) models: A tutorial. Journal of Memory and Language, 110. Spiegelhalter, D. J., Freedman, L. S., & Parmar, M. K. (1994). Bayesian approaches to randomized trials. ­ ­357–416. ​­ Journal of the Royal Statistical Society. Series A (Statistics in Society), 157 (3), Sprouse, J., Wagers, M. W., & Phillips, C. (2012). A test of the relation between working-memory capacity and syntactic island effects. Language, 88 (1), ­ ­82–123. ​­ Tendeiro, J., Kiers, H., Hoekstra, R., Wong, T. K., & Morey, R. D. (2022). Diagnosing the use of the Bayes factor in applied research. https://doi.org/10.31234/osf.io/du3fc ­ ­ ­ ­ Vasishth, S. (2020). Using Approximate Bayesian Computation for estimating parameters in the cue-based retrieval model of sentence processing. MethodsX, 7, 100850. Vasishth, S. (2023). Some right ways to analyze (psycho)linguistic data. Annual Review of Linguistics, 9, ­273–291. ​­ https:// doi. org/ 10. 1146 / ­annurev-linguistics-031220-010345 ­​­­ ­​­­ ​­ Vasishth, S., & Engelmann, F. (2022). Sentence comprehension as a cognitive process: A computational approach. Cambridge University Press. Vasishth, S., & Gelman, A. (2021). How to embrace variation and accept uncertainty in linguistic and psycholinguistic data analysis. Linguistics, 59, ­1311–1342. ​­ https://doi.org/10.1515/ling-2019-0051 ­ ­ ­­ ­​­­ ​­

328

Directions in statistical analysis for experimental linguistics Vasishth, S., & Lewis, R. L. (2006). Argument-head distance and processing complexity: Explaining both ­ ­767–794. ​­ locality and antilocality effects. Language, 82 (4), Vasishth, S., Mertzen, D., Jäger, L. A., & Gelman, A. (2018). The statistical significance filter leads to over​­ ­ optimistic expectations of replicability. Journal of Memory and Language, 103, ­151–175. https://doi. org/ ­ ­ ­ https://doi.org/10.1016/j.jml.2018.07.004 Vasishth, S., & Nicenboim, B. (2016). Statistical methods for linguistic research: Foundational ideas – Part I. ­ ­349–369. ​­ Language and Linguistics Compass, 10 (8), Vasishth, S., Nicenboim, B., Engelmann, F., & Burchert, F. (2019). Computational models of retrieval pro​­ ­ ­ ­ cesses in sentence processing. Trends in Cognitive Sciences, 23, ­968–982. https://doi.org/https://doi.org/ ­ 10.1016/j.tics.2019.09.003 Vasishth, S., Schad, D. J., Bürki, A., & Kliegl, R. (2022). Linear mixed models for linguistics and psychology: A comprehensive introduction [Under contract with Chapman and Hall/CRC Statistics in the Social and Behavioral Sciences Series]. Vasishth, S., Yadav, H., Schad, D., & Nicenboim, B. (2022). Sample size determination for Bayesian hierar­ chical models commonly used in psycholinguistics. Computational Brain and Behavior. https://doi.org/ ­ ­ ­ ­­ ­​­­ ­​­­ ​­ https://link.springer.com/article/10.1007/s42113-021-00125-y ­ 2–4. ­ ​­ Von Baeyer, H. C. (1988). How Fermi would have fixed it. The Sciences, 28 (5), Wasserstein, R. L., & Lazar, N. A. (2016). The ASA’s statement on p-values: Context, process, and purpose. ­ 129–133. ­ ​­ The American Statistician, 70 (2), Winter, B. (2019). Statistics for linguists: An introduction using R. Routledge.

329

21 ASSESSING ADULT LINGUISTIC COMPETENCE Lydia White

21.1

Introduction

This chapter considers ways in which to explore the linguistic competence of adult speakers, whether native or non-native, adopting the definition of linguistic competence assumed in the framework of generative grammar (e.g., Chomsky, 1965). Linguistic competence is understood as the unconscious knowledge of language which underlies language use (comprehension and production). Linguistic competence is mentally represented in the form of an abstract grammar, covering core modules including phonology, morphology, syntax, and semantics. The grammar interfaces with external domains, such as discourse, that often determine which range of grammatical possibilities will be appropriate for a particular situation. The grammar is constrained by an innately available Universal Grammar (UG) (Chomsky, 1965, and subsequently), the assumption being that there are many properties of language that could not be acquired based on input alone. Linguistic competence, then, includes universal properties as well as language-specific ones; investigations of linguistic competence have looked at both. The concept of an abstract linguistic competence has been extended to research on additional language learning, including second language (L2) and third language (L3) acquisition, where it is often referred to as interlanguage competence. Here, the aim has been to establish the nature of the learner’s linguistic representations and how they change over time, as well as the role of UG and the extent of influence from previously acquired languages (White, 2003). These include the learner’s first language (L1), and the L2 in case of L3 acquisition. As we shall see, there have been several different approaches to probing the nature of linguistic competence, ranging from the use of offline tasks, such as grammaticality judgements and truth-value judgements, to online tasks and methods, such as monitoring of eye movements. Methods that are suitable for testing adult native speakers are also suitable for non-native speakers, and vice versa. L2 and L3 experiments typically include native speaker controls. This is to establish the presumed linguistic competence of native speakers, partly to determine the extent to which interlanguage competence is similar or different, and the nature of differences, if any. Linguistic competence is an abstraction. In trying to assess linguistic competence, it is necessary to have a theory as to what that competence consists of. Theories determine the nature of linguistic competence; indirectly, this influences what tasks might be appropriate, what structures DOI: 10.4324/9781003392972-24 330

Assessing adult linguistic competence

are chosen for testing, and what particular test items will be relevant. In other words, the theory will determine the phenomena to be investigated. For instance, linguistic theories that assume the existence of UG propose certain universal principles underlying the acquisition and use of grammar. This means that one can focus experimentally on whether or not there is evidence for such principles constraining the grammars of native and non-native speakers. As far as L2 and L3 are concerned, theories that assume crosslinguistic influence, or transfer, propose that the earlier acquired grammars influence the L2 or the L3 representations; again, this will help to determine the precise questions to be investigated.

21.2

Historical perspectives

Linguistic competence encompasses unconscious knowledge of grammaticality and ungrammaticality, something that has been explored from the early days of generative grammar, where there has always been considerable reliance on the intuitions of linguists to determine grammaticality (see Schütze, 1996). Such intuitions were often those of very few speakers, based on very few sentences and gathered without consideration of how a variety of extra-grammatical factors can influence judgements. Judgements may, in fact, be swayed by choice of vocabulary, the potential context of an utterance (which, in the absence of an explicit context, might vary, depending on who is judging the sentence), plausibility, and other factors. As a result, it was not always clear whether intuitions of linguists were generalisable. To circumvent such problems, grammaticality or acceptability judgement tasks were introduced, attempting to control for a variety of extralinguistic factors that might impinge on intuitions, by increasing the number of sentences testing for any chosen structure, ensuring that grammatical and ungrammatical sentences are equally represented, controlling for vocabulary used, etc. (see Schütze, 1996, for detailed discussion). In fact, grammaticality judgement tasks were used quite extensively in the L2 field prior to the time of Schütze’s observations pertaining to native speakers (see Birdsong, 1989). In a grammaticality judgement task, participants are presented with a list of sentences, typically without any contexts, and asked to consider whether each sentence is grammatical or ungrammatical (or equivalent wording). The idea is that speakers will reject ungrammatical sentences while accepting corresponding grammatical ones. From relatively early on the question arose as to whether intuitions/judgements are categorical (requiring yes/no decisions) or a matter of degree (leading to relative acceptability) (e.g., Duffield, 2003). In linguistic theory, intuitions were often presented as relative: sentence X sounds better than Y. Whether knowledge of grammaticality is assumed to be categorical or not will influence the nature of the tasks adopted: either binary choice tasks where participants indicate grammatical versus ungrammatical, or tasks that involve assessing sentences on a scale, often a Likert scale. Scalar judgements have frequently been employed in L2 research, partly because of the assumption that learner judgements might be inherently less categorical. The most extreme version of scalar judgements involves the use of magnitude estimation to determine relative grammaticality in native speakers (e.g., Bard et al., 1996; Keller & Asudeh, 2001) and L2 learners and speakers (e.g., Sorace, 1993, 2010). In this methodology, participants create their own scale: they are exposed to an initial sentence, to which they assign an arbitrary number. Subsequently, they assign numbers to linguistic stimuli proportional to the perceived acceptability of the first item. One early consideration was over the difference between grammaticality and acceptability (e.g., Leivada & Westergaard, 2020; Schütze, 1996). Grammatical sentences are sentences permitted/ generated by the grammar, in other words, well formed in terms of the grammar. A grammatical 331

Lydia White

sentence might nevertheless be unacceptable or odd (e.g., Colourless green ideas sleep furiously, Chomsky, 1957) and an ungrammatical sentence might be acceptable (e.g., More people have been to Berlin than I have, Montalbetti, 1984). Because any judgements made are judgements as to what an individual finds acceptable, which may not completely align with grammatical well formedness, the term acceptability judgement is often used (or the terms are used interchangeably). The term grammaticality judgement will, nevertheless, be used henceforth. In L2 research, test instructions often make it clear that what is being asked for is the individual’s impression of the sentences under consideration and not what the individual presumes to be (prescriptively) correct in the L2 (see Bley-Vroman et al., 1988, for an early approach to such instructions). While grammaticality judgement tasks offer the advantage of being able to probe for knowledge of grammaticality, there are nevertheless certain disadvantages, particularly relating to the fact that one is using a metalinguistic and explicit form of testing to investigate unconscious and implicit knowledge. Even if participants are asked to judge as quickly as possible, they are reflecting on the form of the sentence and whether it is acceptable to them. At least in some cases, this is not an issue. When asked to judge a violation of a principle of UG – such as Which article did you criticise the man who wrote? – it is unlikely that participants will have any kind of metalinguistic knowledge to draw on. In other words, they can recognise ungrammaticality without knowing why. Furthermore, timed tasks can circumvent the problem, since the need to respond quickly prevents participants from drawing on explicit knowledge (e.g., Ellis, 2005; Godfroid et al., 2015). There are other aspects of grammatical knowledge that cannot be tested by means of grammaticality judgements, particularly sentence interpretation. Despite such problems, grammaticality judgement tasks remain a major means of investigating linguistic competence, along with other methodologies subsequently developed, as discussed in Section 21.5.

21.3

Critical issues and topics

In this section, two critical issues are addressed relating to the investigation of linguistic competence. The first applies to research on native and non-native speakers alike, the second is relevant in the context of additional language acquisition, whether L2 or beyond. The first point to consider is the question of what precisely counts as linguistic competence. Chomsky (1965 and subsequently) introduced a distinction between competence and performance. While competence is the unconscious linguistic knowledge that speakers attain, performance involves the actual use of that knowledge, which may sometimes deviate from the underlying linguistic knowledge. Nowadays, the distinction is often formulated as a difference between representation – in terms of the mental grammar – and processing, meaning language use in real-time sentence comprehension (parsing). In other words, two distinct but related systems are assumed, the latter drawing on the former.1 This raises the question of what is shown by experimental results. Is it competence (albeit indirectly) or is it processing? How can one tell? In the L2 field, when differences are observed between the linguistic behaviour of native and non-native speakers, discussion has centred on whether such differences indicate fundamental differences in linguistic competence (e.g., Bley-Vroman, 1990) or whether observed differences can be attributed to factors such as speed of processing or memory capacity (e.g., Cunnings, 2017; Hopp, 2010). L2 users may take longer to parse a particular construction but nevertheless show evidence of an underlying grammar that is similar to that of native speakers. The second critical issue relates to effects of language transfer on linguistic competence. Many researchers have argued that the linguistic representations of L2 learners and speakers are affected, indeed initially determined, by the L1 grammar (e.g., Schwartz and Sprouse, 1996; White, 1985). 332

Assessing adult linguistic competence

Investigation of such questions requires that experiments include specific groupings of participants, including participants whose L1 shares/does not share the structure of interest. This is to ensure that any findings presumed to be attributable to the L1 are indeed due to L1 and not due to some general L2 phenomenon. White (1985), in an early L2 study on transfer of UG parameter settings, showed that Spanish-speakers accept omitted subjects in their L2 English, null subjects being an option permitted in their L1. In contrast, French-speakers, whose L1 does not allow subject omission, did not accept null subjects in English. Thus, these results rule out the possibility that omission of subjects is something that L2 learners in general do, regardless of L1. A related question involves language transfer in L3 acquisition and the effects of prior language knowledge on L3 representations. The question arises as to the extent and source of language transfer, be it from the L1, the L2, or both. There are several different theories in this domain (see Puig-Mayenco et al., 2020, for an overview). Three major issues, often interconnected, are (1) which language(s) determines the initial state in L3; (2) the effects of typological or structural similarity; and (3) whether transfer is wholesale or piecemeal. This has led to a debate over the most appropriate methods to use in L3 research, in terms of the language groupings under investigation. On the one hand, proponents of ‘mirror-imaging’ (e.g., Puig-Mayenco et al., 2020) argue that for any particular L3 property being tested, the property should be tested in both the prior languages as well. Crucially, the order of the L1 and L2 should be varied. Given three languages, with X as L3, and A and B as the prior languages, one grouping of participants should involve ABX and the other BAX, thus allowing one to determine the relative effects of L1 over L2. A different approach is advocated by Westergaard et al. (2017), who propose that L3 learners (ABX) should be compared to two L2 groups, AX and BX. Which approach is more appropriate will depend on the precise research question(s) under investigation. What this point illustrates is that additional complexities are introduced depending on the type of acquisition involved: L2 competence involves issues over and above native speaker competence, L3 competence involves issues over and above L2.

21.4

Current contributions and research

Theories change and develop over time. A case in point is provided by the ongoing focus on linguistic interfaces, namely the relationship between different linguistic modules such as syntax and semantics, as well as how linguistic modules interface with external domains, such as syntax with discourse (e.g., Ramchand & Reiss, 2007; Sorace, 2011). Changes in methodology may prove beneficial, or even necessary, for exploring linguistic interfaces. For example, studies relating to the syntax/discourse interface have focussed on cases where the same word order is paired with more than one interpretation or where a particular interpretation is preferred but others are not excluded. Consider a sentence like (1). 1 Mary wrote to Sue while she was away. At issue is whether she refers to Mary or to Sue or to some other person. In other words, the sentence is ambiguous. The intended interpretation will usually be clear from the context but, in the absence of any such indication, the preferred interpretation is for the pronoun to refer to the main clause subject, Mary. If we are interested in how participants interpret sentences, grammaticality judgements will not be effective, since the sentence is grammatical under all the interpretations. Instead, other methodologies have been developed to explore which interpretations participants adopt, and in what contexts. Sorace and colleagues, for example, have used picture selection: a 333

Lydia White

particular sentence is presented along with several pictures, each illustrating a possible interpretation. Participants indicate which picture(s) fits the sentence (e.g., Sorace & Filiaci, 2006). The issue of how to test for ambiguity is further explored in Section 21.5. In a related vein, recent consideration of the phonology/syntax interface has led to the need for additional methodologies (Goad & White, 2019). This interface is concerned with how phonological properties impinge on syntax. Consider, once again, the sentence in (1). If the pronoun is unstressed, the main clause subject, Mary, is the preferred antecedent for the pronoun. However, if this sentence is spoken with stress on she, the preferred interpretation shifts, such that Sue is understood as the antecedent of the pronoun. Because this observation relates to prosody, it is crucial to employ auditory stimuli for comprehension and to elicit them for production. In other words, participants must hear such sentences, rather than read them. Various studies on several languages have recently focussed on methodologies suitable for determining how native and non-native speakers make use of stress in their production and interpretation of sentences (e.g., Gargiulo & Tronnier, 2020; White et al., 2022a, b).

21.5

Main research methods

As noted by numerous researchers, any attempts to explore the nature of linguistic representations will inevitably involve performance. Linguistic competence cannot be tapped directly: all experimental tasks are measures of performance from which one attempts to infer the nature of the underlying competence. Bearing this point in mind, a variety of methodologies have been developed to indirectly investigate linguistic competence, which vary in part depending on what aspects of competence are being examined. The kinds of data that will be considered here include intuitional data and comprehension data. Production data, elicited experimentally, are used less frequently in the case of adult participants (see Crain & Thornton, 1998, for use of this methodology with young children). Problems with grammaticality judgements have already been mentioned. Consequently, alternative approaches have been developed. As far as knowledge of (un)grammaticality is concerned, the idea is to find some indirect means of assessing it, as provided by truth-value judgement tasks, for example. In the case of online tasks, certain assumptions are made, such as the idea that ungrammatical sentences take longer to process, yielding longer reading times than grammatical ones. Thus, reading time may provide a measure of grammaticality without having participants specifically make judgements. There is no single methodology suitable for investigating all aspects of linguistic competence. The choice will depend on what the researcher is trying to discover, what grammatical properties are under investigation. As we shall see, offline tasks are often assumed to be appropriate for determining the nature of linguistic representations. In contrast, online tasks are taken to provide measures of language use in real time, relating to the time course of processing. In fact, this distinction is too rigid. Both offline and online measures can be used to help determine the nature of linguistic representations, as well as processing considerations. Table 21.1 presents a selection of methods and tasks that have been used over the years to investigate intuitions and comprehension, for both native (L1) and non-native speakers (L2, L3). These are by no means the only techniques that have been used. Here, we will focus on two well-known and much-used offline tasks, namely grammaticality judgements and truth-value judgements, followed by one type of online method, involving monitoring of eye movements (eye-tracking). These methodologies allow us to address some of the major concerns involved in investigating the nature of the mental grammar. 334

Assessing adult linguistic competence ­Table  21.1 Testing for linguistic competence, offline and online Offline

Online

Focus on competence/underlying knowledge. Tasks often involve conscious ­decision-making. ​­ Grammaticality/acceptability ­ judgements (including preferences) judgements ­Truth-value ​­ Comprehension: picture selection Comprehension: act out

Focus on (automatic) processes underlying real-time language use (parsing). Tasks often include response time measures. ­Sentence-matching ​­ reading ­Self-paced ​­ Eye-movement monitoring (visual world paradigm or reading) (ERP; fMRI) ­Neuro-imaging ​­ ­

To illustrate the different perspectives, the focus here will be on tasks relating to one particular grammatical property, namely Principle A of the Binding Theory (Chomsky, 1981). This principle determines the reference possibilities for anaphors like reflexives (himself, ­ herself, etc.). Informally stated, Principle A requires that the antecedent of a reflexive must be local (usually within the same clause) and must c-command it (i.e., the first branching node dominating the antecedent must also dominate the reflexive). The sentences in (2) illustrate the point. 2 a b c d

Susan likes herself Mary thought that Susan liked herself *Mary thought that John liked herself *Mary’s father likes herself

The antecedent of the reflexive in (2a) is Susan, the two occurring in the same local domain. In (2b), the reflexive can refer to Susan but not Mary, because only Susan is local enough, being in the same clause. In (2c), Mary is not local enough and there is a gender mismatch between the potential local antecedent, John, and the reflexive, herself. In (2d), Mary is within the same clause as the reflexive but does not c-command it. There are, in fact, crosslinguistic differences in terms of what counts as a locality domain, differences which are relevant in the context of L2 acquisition. Languages like Japanese, for example, have long-distance reflexives. The equivalent of sentences like (2b) would be ambiguous, antecedents like Mary being permissible even though they are not in the same clause as the reflexive. As already described, probably the first and best-known experimental method developed for investigating linguistic competence involves grammaticality judgement tasks. Participants are asked to consider whether the sentences under investigation are grammatical or ungrammatical, assessing them either categorically or on a scale. The precise instruction can vary from experiment to experiment, with participants asked to indicate grammatical versus ungrammatical, possible versus impossible, acceptable versus unacceptable, or correct versus incorrect. The presumption is that speakers will reject ungrammatical sentences while accepting corresponding grammatical ones. Sometimes participants are asked to correct sentences that they deem ungrammatical, especially in the L2 context. This is to ensure that they are focusing on relevant aspects of the sentence. An early example of the use of grammaticality judgements to test Binding Principle A is provided by Gordon and Hendrick (1997). They tested adult native speakers of English on sentences differing in c-command, like those in (2a) and (2d). The proportion of sentences like (2a) judged to be acceptable was .94, contrasting with sentences like (2d) at .06. In other words, naive 335

Lydia White

native speakers showed judgements which confirmed the intuitions of linguists. However, there were very few sentences involving Principle A, as this was not the focus of their study. Keller and Asudeh (2001) used the same sentence types in a magnitude estimation task and expanded the set, replicating the earlier findings. Here, two very different types of judgement task were involved, one requiring participants to accept or reject coreference between the reflexive and noun, the other to indicate degrees of grammaticality on a self-created scale (see Section 21.2). These tasks yielded essentially the same results, supporting the claim that native speakers observe Principle A. As for L2, Felser et al. (2009) tested Japanese-speaking learners of English and English native speakers on timed and untimed grammaticality judgement tasks, looking at the locality requirement (2c) and the c-command requirement (2d). Results on the untimed task showed an over 96% accuracy rate for both groups; everyone observed the locality and c-command requirements, even though the locality requirement is different in the L1; participants were as accurate at rejecting ungrammatical sentences as they were at accepting grammatical ones. On the timed task, accuracy was somewhat lower for both groups. The native speakers took longer to judge c-command violations like (2d), while the L2 learners took longer for locality violations (like 2c). Turning to sentences like (2b), if asked to judge such sentences, native speakers would presumably indicate that they are grammatical/acceptable. However, the real issue here is not grammaticality as such but grammaticality as it relates to interpretation. Does the person judging the sentence take Susan to be the antecedent of the reflexive, or Mary? For native speakers of English, it is presumably Susan but one cannot tell from a judgement whether that is indeed the case. In the L2 context, if learners are native speakers of languages like Japanese, Korean or Chinese, such sentences are potentially ambiguous, assuming L1 transfer. For such reasons, in L2 research there was a move towards the use of truth-value judgement tasks, originally developed by Crain and colleagues for use with young children. Thornton (2017) presents a comprehensive overview of this methodology, and Chien and Wexler (1990) provide an early example of a truth-value judgement task testing binding principles with young children. In a truth-value judgement task, participants assess the appropriateness of a sentence in relation to a given context, provided in the form of pictures or short written texts in the case of most experiments involving adults (act out or filmed scenarios in the case of children). Contexts are designed to favour one interpretation over the other and each stimulus type is paired with appropriate and inappropriate contexts. Judgements to be made are true or false rather than grammatical or ungrammatical. Indeed, no sentences in a truth-value judgement task are ungrammatical. They can, however, be inappropriate in certain contexts. The idea is that acceptance of the appropriate context-sentence pairing and rejection of the inappropriate one indicates something about participants’ underlying knowledge of the properties in question. One case is provided by White et al. (1997). This research includes another property of reflexives, namely that, in languages like English, the antecedent of the reflexive can be the subject or the object within the local clause. In other words, sentences like (3) are potentially ambiguous. Nevertheless, native speakers of English have been shown to strongly prefer subject antecedents, across a range of methodologies testing this issue, and to be mostly unaware of the potential ambiguity. 3 Mr Brown sold Mr. Green a picture of himself The question is whether the choice of subject antecedents reflects a preference for – or bias towards – subjects, or whether object antecedents are impossible, disallowed by the grammar. 336

Assessing adult linguistic competence

White et al. conducted two experiments, testing adult learners of English, with French and Japanese as the L1s, as well as native speaker controls. Each experiment involved a different truthvalue judgement task. In one case, the contexts were provided by stories (see [4]), in the other case by pictures (see Figure 21.1). Sentence types in both tasks were the same. In each case, participants had to indicate whether the test sentence was true or false in the context. In the example in (4), the context sets up the expectation that herself refers to Susan (the object) rather than the nurse (the subject). If participants allow objects to be antecedents, they should indicate the sentence to be true; if they only permit subject antecedents, it would be false. 4 Susan wanted a job in a hospital. A nurse interviewed Susan for the job. The nurse asked Susan about her experience, her schooling and whether she got on well with people. The nurse asked Susan about herself. A similar logic applies in the case of contexts provided by pictures. In Figure 21.1, the sentence is true in the context of the picture but only if object antecedents are permitted. Interestingly, the tasks yielded quite different results. In the case of the contexts provided by stories, native speakers and non-native speakers accepted the sentences as true in both subject and object contexts, as appropriate, revealing knowledge of ambiguity. In contrast, in the picture task, native speakers and learners largely indicated that sentences with contexts favouring object antecedents were false, thus, seemingly, rejecting ambiguity in such cases. This difference between the tasks, both involving truth-value judgements, is of some concern. When tasks yield conflicting results, it is unclear which set of results provides a better reflection of underlying competence. There are several reasons, in this case, to assume that the differences in results reflect differences in the methodologies. One possibility is that participants first read the

­Figure  21.1

Picture task example.

337

Lydia White

sentences, arriving at their preferred interpretation before looking at the picture, thus rejecting object cases because of a strong preference for subject antecedents. The story task, on the other hand, more closely approximated the kind of task used with children, where children watch a scenario being acted out and only then are given a sentence (typically uttered by a puppet) to agree or disagree with. Regardless of the explanation, this illustrates that no single task can be relied upon to reveal underlying knowledge of language. Ideally, several different tasks should be used, probing the phenomena in different ways. So far, we have seen that grammaticality judgement tasks are somewhat metalinguistic, at least when untimed, and that truth-value judgements can yield conflicting results, at least in the case examined here.2 We now turn to online methods, where what is assessed is something other than conscious choices about sentences, potentially offering an alternative means of assessing linguistic competence. Here we consider experiments involving eye-tracking. Different kinds of tasks are used: some involve written stimuli requiring reading, others include pictures to be looked at while attending to auditory stimuli (the visual world paradigm).3 The assumption is that grammaticality (competence) can be explored by this means, as well as processing (including, e.g., the time course of processing, delays/difficulties caused by certain kinds of constructions, or at certain points in a structure, effects of working memory). Continuing with our examples relating to Principle A, there have been a number of eye-tracking studies that have investigated native speakers (e.g., Cunnings & Felser, 2013; Cunnings & Sturt, 2014; Koornneef, 2010; Sturt, 2003) and non-native speakers (e.g., Felser et al., 2009; Felser & Cunnings, 2012; Kim et al., 2014). Sturt (2003), for example, accepts the essential correctness of Principle A as a principle constraining knowledge of reflexive binding. He investigated the time course of application of Principle A, entertaining several possibilities as to when the principle comes into effect during online processing (i.e., while reading). One possibility is that it applies as soon as it can. In the case of (2b), repeated here as (5), this means that Susan will be identified as the antecedent as soon as the reflexive is encountered and Mary  – ​­the ­so-called ​­ inaccessible antecedent (ruled out by Principle A) – will not be considered. 5 Mary thought that Susan liked herself Another possibility is that there may be some delay in the application of Principle A, such that inaccessible antecedents are initially considered before being ruled out. Sturt (2003) concludes, based on first fixation and first pass measures involving reading of sentences by English native speakers, that constraints on reflexive interpretation are applied very early. At the same time, results from second pass reading time suggest that the inaccessible antecedent was also considered, especially when the gender of the stereotypical accessible antecedent mismatched the gender of ­ the reflexive (e.g., surgeon/herself). Such results help to determine at what point Principle A comes into play. However, the point at which a constraint applies is not itself a matter of grammatical representation. Nevertheless, eyetracking data can also be used to offer insights into linguistic competence. Sturt’s stimuli did not manipulate grammaticality, since all test items were grammatical but varied in the stereotypical gender of the potential antecedents. Dillon et al. (2013), in contrast, manipulated grammaticality and found a slowdown at the critical region involving the reflexive in first pass reading and total reading time in the case of ungrammatical sentences violating Principle A. In other words, recognition of ungrammaticality was reflected in slower reading time. Cunnings and Felser (2013) investigated whether a non-grammatical strategy – favour the antecedent linearly closer to the reflexive – might account for previous results. Comparing native 338

Assessing adult linguistic competence

speakers with high and low reading spans and manipulating the position of the inaccessible antecedent so that it was either closer to or further away from the reflexive, they found clear evidence for the early operation of Principle A. First fixation duration and first pass reading time at the reflexive region were longer when the accessible antecedent mismatched the reflexive in gender; for the participants with high reading spans, this was regardless of the linear order of the inaccessible antecedent, showing that they were not simply using a linear strategy. For the participants with low reading spans, when the inaccessible antecedent was closer to the reflexive, first fixation durations and first pass times suggested that both the potential antecedents were being considered, which should not have been the case if only a linear strategy was adopted. In sum, linguistic competence was constrained by a syntactic principle operating during parsing and not just by a least effort strategy (chose the closest antecedent regardless of structural constraints). The interest in how inaccessible antecedents are treated during processing is taken up by Felser et al. (2009) and Felser and Cunnings (2012) in the L2 context, again using eye-tracking involving reading. In two different studies, one involving Japanese-speaking and the other German-speaking learners of English, gender was manipulated on accessible and inaccessible antecedents. Native speaker controls in both studies replicated the results of Sturt (2003) in that they showed evidence that Principle A was applied early on in processing (shown by first fixation duration on the reflexive). In contrast, both learner groups showed longer reading time (first fixation duration/first pass reading on the reflexive) relating to the gender of the inaccessible antecedent, although they differed as to what this effect was: inaccessible mismatch effect for the German speakers, inaccessible match effect for the Japanese. Effects for the accessible antecedent showed up later (regression path and second pass time), suggesting later application of Principle A. The question arises as to whether results from these eye-tracking studies reflect competence differences between native and non-native speakers. The answer cannot be determined based on the eye-tracking results alone, at least when only grammatical sentences are manipulated, as was the case here. A consideration of results from several complementary tasks is necessary. In fact, both studies included an offline multiple-choice task where participants had to identify potential antecedents. Native speakers and learners showed very similar results, namely a high degree of accuracy. Furthermore, as discussed earlier, results on the untimed grammaticality judgement task in Felser et al. (2009) also showed a very high degree of accuracy. In the speeded task, accuracy remained high (though not as high as in the untimed task). It seems, then, that the delays shown by learners in eye-tracking experiments do not relate to how reflexives are represented but, rather, to when constraints relating to them are accessed. We now turn to another property of reflexives. Not all reflexives are true anaphors, subject to Principle A (e.g., Reinhart & Reuland, 1993). Rather, some of them are so-called logophors with freer reference possibilities, governed by properties of the discourse. Consider (3), repeated here as (6). 6 Mr Brown sold Mr. Green a picture of himself In (6), either the subject or the object can serve as an antecedent, as we have seen. Logophoric reflexives behave more like pronouns than anaphors; in sentences like (6), the pronoun him could also be used (instead of the reflexive) to refer to either of the NPs. Unlike true anaphors, whose antecedents are determined purely syntactically, discourse considerations come into play in determining antecedents of logophors. According to Reinhart and Reuland (1993), true anaphors must be arguments of the same predicate as the antecedent, in other words, coarguments. This is not the case for logophors. In (6), the potential antecedents and the reflexive are not coarguments. 339

Lydia White

Differences between true anaphors and logophors have been investigated using eye-tracking. Koornneef (2010) argues that logophors will take longer to process than anaphors because considerations over and above pure syntax come into play. This is tested with Dutch native speakers using eye-tracking involving reading. Results showed that logophors indeed elicited longer first pass reading time, suggesting processing difficulties, as hypothesised. In addition, although this is not discussed, results suggest that anaphors and logophors are different constructs as far as the grammar is concerned. In other words, they are represented differently. Kim et al. (2014) explore this issue in the L2 context. Like Koornneef (2010), Kim et al. assume that true reflexives are licensed by syntactic principles, in contrast to logophors, which are subject to discourse constraints, and that logophors are more complex to compute than true anaphors. In a visual world eye-tracking study with Korean-speaking learners of English and English native speaker controls, test items included true reflexives (e.g., Look at Goofy. Have Mickey touch himself) and logophors (e.g., Look at Goofy. Have Mickey touch a picture of himself). Participants were instructed to look at a display screen and then perform actions, based on aural instructions. Final antecedent choice for true reflexives for both groups was the sentence subject, well over 90% of the time. Proportion of looks to the subject rose after hearing a reflexive (and fell after hearing a pronoun or a name). As for the logophors, final choice of antecedents was, again, predominantly the subject of the sentence. However, time to fixate on the subject antecedent for logophoric reflexives took the learners considerably longer than it did for true reflexives. By and large, the L2 participants showed knowledge of Principle A and treated true anaphors and logophors differently from each other in terms of processing time. To summarise, while much of the online research has addressed issues such as the time course of processing, the costliness of processing certain linguistic phenomena and the effects of memory, it has also proved relevant as far as linguistic competence is concerned, providing evidence for the operation of universal principles and for knowledge of ungrammaticality.

21.6

Recommendations for practice

As already discussed, in trying to assess linguistic competence, it is necessary to have a theory as to what that competence consists of, as well as a theory of language acquisition in the case of learners. Theories specify the nature of linguistic competence and how it is acquired. This informs the hypotheses and research questions. For example, linguistic theories that assume the existence of UG propose certain universal principles underlying the acquisition of grammar. This means that one can focus experimentally on whether there is evidence for such principles constraining the linguistic representations of native and non-native speakers. As far as L2 is concerned, theories that assume crosslinguistic influence or transfer propose that the L1 grammar influences the L2; again, this will help to determine the precise phenomena to be investigated, as well as the language combinations that are appropriate. What this implies for practice is that one needs to have a particular theory or analysis of the linguistic phenomena under investigation. This will determine the kinds of structures to be tested, the suitability of different tasks, what specific test items are relevant, and, in the case of L2 and L3 acquisition, the language combinations that are appropriate. As we have seen, tasks devised for testing knowledge of grammaticality are not necessarily the same as tasks devised to test sentence interpretation. Regardless of task chosen, it is crucial to have sufficient test items exemplifying a particular property to allow for statistical analysis, a balance of grammatical and ungrammatical items (or appropriate versus inappropriate), a selection of fillers/distractors, etc.

340

Assessing adult linguistic competence

21.7

Future directions

In terms of methodology, the use of corpus data is gaining considerable prominence, providing quantitative data, now often digitally encoded (see Huang & Yao, 2015). Use of corpus data is more often associated with usage-based theories, which reject the underlying premises of generative grammar (see Wulff, 2020). Corpus data tend to be used descriptively, rather than for testing specific hypotheses relating to linguistic competence (but see Kim et al., 20204; Lozano, 2020). While corpus data can be informative, there are limitations as to the effectiveness of such data. Linguistic competence encompasses unconscious knowledge of grammar (e.g., grammaticality, ungrammaticality, ambiguity) as we have seen. It is not, however, the case that one can simply examine what people produce (via corpus data or by other means) and infer the nature of their competence on that basis. If a speaker never produces certain sentence types, this might be taken to indicate that such sentences are ungrammatical, whereas, in fact, absence of forms in production does not necessarily indicate that they are disallowed by the grammar. Their failure to appear might be due to a problem of data sampling, or because they are dispreferred, avoided or rarely used. Of course, if errors are produced, that can be informative, as is the case for L2 speakers, where errors can reflect properties of the interlanguage grammar. One of the advantages of experimental methods, then, is to allow the investigation of grammatical phenomena that might not show up in spontaneous production, including corpora. One possibility for future research is to combine the use of corpora with the use of experimental techniques for the investigation of theoretically driven claims about linguistic competence. Corpus data might reveal the absence of certain forms that might be expected/not expected on theoretical accounts. This could then be followed up experimentally. An early example of this type involves the proposal that there are two types of intransitive verbs, unaccusatives and unergatives, which differ as to whether or not the subject of the verb is agentive (e.g., Burzio, 1986). In (7a), the subject of the verb danced is performing the action of her own volition; such verbs are known as unergative. In (7b), on the other hand, the subject of the verb fell is not the agent; such verbs are unaccusative. ­7 a Jane danced b The apple fell Unaccusative verbs are like passives in taking theme subjects, rather than agents. A well-known error by L2 learners is to treat unaccusatives like passives, giving them passive morphology, as in (8) (from Zobl, 1989). 8 My mother was died. Using data from the Longman Learners’ Corpus, Oshita (2000) found that out of a total of 941 tokens of unaccusative verbs, 4% of them involved passivisation errors like (8). In contrast, out of 640 tokens of unergative verbs, there was only one passivisation error. While these results are consistent with the division of intransitives into two different verb types, the numbers are not high and might be accidental. However, several researchers have used experimental data, often involving grammaticality judgements, to try to determine whether learners indeed treat the two classes of unaccusative verbs differently (e.g., Balcom, 1997; Hirakawa, 1995). Results of the experimental research were consistent with the corpus data and at the same time strengthened it: there was a much higher acceptance of passivised unaccusatives than passivised unergatives. Here then is an

341

Lydia White

example of how experimental data and non-experimental data can complement and supplement each other. This combination of methods is something that it would be fruitful to explore, especially within the context of the same study. In conclusion, in this chapter we have presented a view of linguistic competence which is held by researchers working within the generative linguistic paradigm, including researchers working on language acquisition. Different means of assessing native and non-native competence (although by no means all of them) have been discussed, together with what they are able to show or not show.

Notes 1 See Lewis and Phillips (2015) for arguments that the grammar and language processing mechanisms are not distinct but, rather, involve the same cognitive system. 2 These two tasks yielded consistent results for the other structures tested, including sentences involving embedded clauses like Mr. Green imagined that Mr. Brown painted himself, where the interpretation involving long-distance antecedents was not chosen. 3 Measurements are taken at different points, for example: first fixation (first look/gaze at a text/picture); first pass (first progressive movement through a text); second pass (subsequent movement through a text); regression path (movement back to an item). Duration/reading time is calculated as the sum of all fixations in the region of interest. 4 Kim et al. (2020) investigate the distribution of English reflexives using corpora.

Further reading Blom, E., & Unsworth, S. (Eds.). (2010). Experimental methods in language acquisition research. John Benjamins. Goodall, G. (Ed.). (2021). The Cambridge handbook of experimental syntax. Cambridge University Press. Ionin, T., & Zyzik, E. (2014). Judgment and interpretation tasks in second language research. Annual Review ​­ of Applied Linguistics, 34, ­37–64. Schütze, C. (2011). Linguistic evidence and grammatical theory. Wiley Interdisciplinary Reviews Cognitive Science, 2, ­206–221. ​­

Related topics Experimental syntax; experimental methods for studying second language learners language learners; experimental methods to study bilinguals experimental methods to study child language.

References Balcom, P. (1997). Why is this happened? Passive morphology and unaccusativity. Second Language Research, 13, ­1–9. ​­ Bard, E. G., Robertson, D., & Sorace, A. (1996). Magnitude estimation of linguistic acceptability. Language, 72, ­32–68. ​­ Birdsong, D. (1989). Metalinguistic performance and interlinguistic competence. Springer Verlag. ­ ​­ Bley-Vroman, R. (1990). The logical problem of foreign language learning. Linguistic Analysis, 20, 3–49. Bley-Vroman, R., Felix, S., & Ioup, G. (1988). The accessibility of universal grammar in adult language ­ ​­ learning. Second Language Research, 4, 1–32 Burzio, L. (1986). Italian syntax, a government-binding approach. Reidel. Chien, Y.-C., & Wexler, K. (1990). Children’s knowledge of locality constraints in binding as evidence for the ­ ​­ modularity of syntax and pragmatics. Language Acquisition, 1, 225–295. Chomsky, N. (1957). Syntactic structures. Mouton.

342

Assessing adult linguistic competence Chomsky, N. (1965). Aspects of the theory of syntax. M.I.T. Press. Chomsky, N. (1981). Lectures on government and binding. Foris. Crain, S., & Thornton, R. (1998). Investigations in universal grammar. M.I.T. Press. Cunnings, I. (2017). Parsing and working memory in bilingual sentence processing. Bilingualism: Language ​­ and Cognition, 20, ­659–678. Cunnings, I., & Felser, C. (2013). The role of working memory in the processing of reflexives. Language and ​­ Cognitive Processes, 28, ­188–219. Cunnings, I., & Sturt, P. (2014). Coargumenthood and the processing of reflexives. Journal of Memory and ​­ Language, 75, ­117–139. Dillon, B., Mishler, A., Sloggett, S., & Phillips, C. (2013). Contrasting intrusion profiles for agreement and ​­ anaphora, Experimental and modeling evidence. Journal of Memory and Language, 69, ­85–103. Duffield, N. (2003). Measures of competent gradience. In R. van Hout, A. Hulk, F. Kuiken, & R. Towell (Eds.), ­ The lexicon-syntax interface in second language acquisition. John Benjamins. Ellis, R. (2005). Measuring implicit and explicit knowledge of a second language: A psychometric study. Studies in Second Language Acquisition, 27, ­141–172. ​­ Felser, C., & Cunnings, I. (2012). Processing reflexives in a second language: The timing of structural and ­discourse-level ​­ ​­ constraints. Applied Psycholinguistics, 33, ­571–603. Felser, C., Sato, M., & Bertenshaw, N. (2009). The on-line application of binding Principle A in English as a second language. Bilingualism: Language and Cognition, 12, ­485–502. ​­ Gargiulo, C., & Tronnier, M. (2020). First language attrition on prosody in a foreign language environment. ­ ​­ Journal of Monolingual and Bilingual Speech, 2, 219–244. Goad, H., & White, L. (2019). Prosodic effects on L2 grammars. Linguistic Approaches to Bilingualism, 9, ­769–808. ​­ Godfroid, A., Loewen, S., Jung, S., Park, J., Gass, S., & Ellis, R. (2015). Timed and untimed grammaticality judgments measure distinct types of knowledge: Evidence from eye-movement patterns. Studies in Second ​­ Language Acquisition, 37, ­269–297. Gordon, P., & Hendrick, R. (1997). Intuitive knowledge of linguistic co-reference. Cognition, 62, ­325–370. ​­ Hirakawa, M. (1995). L2 acquisition of English unaccusative constructions. In D. MacLaughlin & S. McE­ wen (Eds.), Proceedings of the 19th Annual Boston University conference on language development ­­  ­291–302). ​­ (pp. Cascadilla Press. Hopp, H. (2010). Ultimate attainment in L2 inflection: Performance similarities between non-native and native speakers. Lingua, 120, ­901–931. ​­ Huang, C. R., & Yao, Y. (2015). Corpus linguistics. In J.D. Wright (Ed.), International encyclopedia of social and behavioural sciences (2nd ed., pp. 949–953). Elsevier Science Publishers. Keller, F., & Asudeh, A. (2001). Constraints on linguistic coreference: Structural vs. pragmatic factors. In Proceedings of the annual meeting of the cognitive science society, 23. Kim, E., Montrul, S., & Yoon, J. (2014). The on-line processing of binding principles in second language acquisition: Evidence from eye tracking. Applied Psycholinguistics, 89(2), ­ 1317–1374. ­ ​­ Kim, J-Y., An, S., & Jung, A. (2020). Binding conditions of English reflexives and pronouns in the ICE-USA. Language Research, 56, ­287–307. ​­ Koornneef, A. (2010). Looking at anaphora: The psychological reality of the Primitives of Binding model. In M. Everaert, T. Lentz, H.D. Mulder, A. Zondervan, & O. Nilsen (Eds.), Linguistics enterprise: From ­­  ­141–166). ​­ knowledge of language to knowledge in linguistics (pp. John Benjamins. Leivada, E., & Westergaard, M. (2020). Acceptable ungrammatical sentences, unacceptable grammatical sentences, and the role of the cognitive parser. Frontiers in Psychology, 11, 364. Lewis, S., & Phillips, C. (2015). Aligning grammatical theories and language processing models. Journal of Psycholinguistic Research, 44, ­27–46. ​­ Lozano, C. (2020). Generative approaches. In N. Tracy-Ventura, & M. Paquot (Eds.), The Routledge handbook of second language acquisition and corpora (pp. ­­  ­213–227). ​­ Routledge. Montalbetti, M. (1984). After binding: On the interpretation of pronouns. PhD dissertation, M.I.T. Oshita, H. (2000). What is happened may not be what appears to be happening: A corpus study of ‘passive’ unaccusatives in L2 English. Second Language Research, 16, ­293–324. ​­ Puig-Mayenco, E., González Alsonso, J., & Rothman, J. (2020). A systematic review of transfer studies in third language acquisition. Second Language Research, 36, ­31–64. ​­

343

Lydia White Ramchand, G., & Reiss, C. (Eds.). (2007). The Oxford handbook of linguistic interfaces. Oxford University Press. Reinhart, T., & Reuland, E. (1993). Reflexivity. Linguistic Inquiry, 24, ­657–720. ​­ Schütze, C. (1996). The empirical base of linguistics: Grammaticality judgments and linguistic methodology. University of Chicago Press. Schwartz, B. D., & Sprouse, R. (1996). L2 cognitive states and the full transfer/full access model. Second Language Research, 12, ­40–72. ​­ Sorace, A. (1993). Unaccusativity and auxiliary choice in non-native grammars of Italian and French: Asym​­ metries and predictable indeterminacy. Journal of French Language Studies, 3, ­71–93. Sorace, A. (2010). Using magnitude estimation in developmental linguistic research. In E. Blom, & S. Unsworth (Eds.), ­ Experimental methods in language acquisition research (pp. ­­  ­57–72). ​­ John Benjamins. Sorace, A. (2011). Pinning down the concept of “interface” in bilingualism. Linguistic Approaches to Bilin​­ gualism, 1, ­1–33. Sorace, A., & Filiaci, F. (2006). Anaphora resolution in near-native speakers of Italian. Second Language ​­ Research, 22, ­339–368. Sturt, P. (2003). The time-course of the application of binding constraints in reference resolution. Journal of ​­ Memory and Language, 48, ­542–562. Thornton, R. (2017). The truth value judgment task: An update. In M. Nakayama, Y. C. Su, & A. Huang ­ ­ ­ ​­ (Eds.), Studies in Chinese and Japanese language acquisition: In honor of Stephen Crain (pp.13–39). John Benjamins. Westergaard, M., Mitrofanova, N., Mykhaylyk, R., & Rodina, Y. (2017). Crosslinguistic influence in the acquisition of a third language: The Linguistic Proximity Model. International Journal of Bilingualism, 21, ­666–682. ​­ White, L. (1985). The pro-drop parameter in adult second language acquisition. Language Learning, 35, ­47–62. ​­ White, L. (2003). Second language acquisition and universal grammar. Cambridge University Press. White, L., Bruhn-Garavito, J., Kawasaki, T., Pater, J., & Prévost, P. (1997). The researcher gave the subject a test about himself: Problems of ambiguity and preference in the investigation of reflexive binding. Language Learning, 47, ­145–172. ​­ White, L., Goad, H., Garcia, G., Guzzo, N., Smeets, L., & Su, J. (2022a). Effects of stress on pronoun interpretation in L2 English. Paper presented at the annual conference of the European Second Language Association (EUROSLA 31), Fribourg, Switzerland, August 2022. White, L., Goad, H., Garcia, G., Guzzo, N., Smeets, L., & Su, J. (2022b). Pronoun interpretation in L2 Italian: Exploring the effects of prosody [Manuscript submitted for publication]. McGill University Wulff, S. (2020). Usage-based approaches. In N. Tracy-Ventura, & M. Paquot (Eds.), The Routledge handbook of second language acquisition and corpora (pp. Routledge. ­­  ­175–188). ​­ Zobl, H. (1989). Canonical typological structures and ergativity in English L2 acquisition. In S. Gass., & J. Schachter (Eds.), Linguistic perspectives on second language acquisition (pp. Cambridge ­­  ­203–221). ​­ University Press.

344

22 DEALING WITH PARTICIPANT VARIABILITY IN EXPERIMENTAL LINGUISTICS Ute Gabriel and Pascal Gygax

22.1

Introduction and definitions

Participant variables are variables that are associated with the participants themselves. Such variables include people’s momentary states but also broad demographic characteristics as well as other personal characteristics and enduring behaviour patterns. The potential variability that participants bring into the laboratory is usually looked upon as introducing extraneous variance, meaning variance that is inadvertently included in the variance that is intentionally produced or measured in an experiment. The underlying notion behind experimentation is that one wishes to observe whether and how a change in one variable (the independent variable – IV) results in a change in another variable (the dependent variable – DV). Put differently, one is interested in whether some variation of one variable (DV) (a) depends on the variation of another one (IV), and (b) is not the result of extraneous, non-experimental, non-manipulated influences. Such extraneous influences are hence usually not valued or welcomed. On the one hand, extraneous influences might generate so much ­non-systematic variation in the DV that the re​­ lationships of interest are obscured. To illustrate, imagine trying to demonstrate gravity with a feather with the option of doing so either in a wind tunnel or in a vacuum. The demonstration will work smoothly in the vacuum, where one gets rid of any air resistance, whereas the demonstration will be very challenging in the wind tunnel, due to the moving air interfering with (and hence obscuring) the pull of gravity. Similarly, one can imagine that extraneous influences might interfere with the effect of the IV, making it hard to observe whether such an effect exists. With such non-systematic variation in the DV comes wider confidence intervals which in turn lowers the experiment’s statistical power. On the other hand, extraneous influences might generate systematic variation and consequently, it can no longer be decided whether any shared variation between IV and DV is due to the systematic variation of the IV or due to third variables confounded with the experimental conditions. Such systematic variation threatens the experiment’s internal validity. As participant variability can turn into such other, non-experimental influence, it is important to bear it in mind when planning and carrying out experiments. This chapter will present and discuss different strategies for handling participant variability in experimental linguistics, but first we

345

DOI: 10.4324/9781003392972-25

Ute Gabriel and Pascal Gygax

seek to ensure a shared understanding of some methodological concepts that are central to what follows, namely statistical power and internal and external validity (Campbell, 1957; Campbell & Stanley, 1963).

22.1.1

Statistical power

Statistical power (1-β) is the likelihood of correctly rejecting a false null hypothesis or put differently, the likelihood that a test picks up an effect under the condition that there truly is an effect. Several factors influence power, and one of these factors is the DV’s variability. The relationship is such that power decreases as the DV’s variability increases. Therefore, reducing (extraneous) participant variability that contributes to the DV’s variability leads to an increase in power. For example, as adults’ reaction times – the time it takes someone to react to a stimulus1 – decline with age (e.g., Welford, 1988; Der & Deary, 2006; Tun & Lachman, 2008), an age heterogeneous sample will – all else being equal – produce more heterogeneous data than an age homogeneous sample. Therefore, to test the hypothesis that an experimental manipulation will produce a certain effect with a certain power, a larger sample would be needed if we were to sample an age heterogeneous group than if we were to sample an age homogeneous group. However, it is always very important to identify the right sources of extraneous variability, as not all factors are necessarily acting as extraneous variability sources. For example, there is – to the best of our knowledge – no evidence that reaction times are systematically associated with someone’s height, therefore, heterogeneity in the sample with reference to height would not be expected to lead to more variation in reaction times, and hence lower power. On a side note, in the context of quantitative data analysis, power analyses can be performed to determine a study’s minimum sample size. To perform such an analysis, one needs to decide on the kind of statistical test that will be performed, the effect size one expects or is relevant for the research question, the alpha error level one wishes to set and the beta error level one wishes to set. There are tools freely available to compute statistical power analyses, such as Gpower (https://www. ­ ­­ ­​­­ ­​­­ ​­ ­ psychologie.hhu.de/arbeitsgruppen/allgemeine-psychologie-und-arbeitspsychologie/gpower).

22.1.2

Internal and external validity

The internal validity of experimental findings describes the extent to which one can be confident that the findings can be attributed to the experimental manipulation and not to another or other factors. The more other competing and plausible explanations for the findings can be provided, the lower the internal validity of the findings. Confounded variables can form the basis for such alternative explanations. For instance, going back to our example of age and reaction times, if the average age of the sub-sample that was randomly assigned to an experimental condition A happens to be higher than the average age of the sub-sample that was assigned to condition B, then the conditions’ mean reaction times would reflect potential differences between the conditions as well as potential age-related differences. Age would then be a confound. Consequently, any findings for the means between experimental conditions could no longer be solely attributed to the experimental manipulation. A strategy to minimise the likelihood for confounding variables is randomisation, meaning the random assignment of participants to experimental conditions. It should be noted that random assignment does not guarantee equivalent experimental groups – it simply ensures that each participant has the same chance of being assigned to a condition. The likelihood that randomisation results in equivalent groups increases with sample size (i.e., the law of large numbers) and decreases with the heterogeneity of the sample. 346

Dealing with participant variability in experimental linguistics

The external validity of experimental findings describes the extent to which one can be confident that the findings apply (or can be generalised) to other people (who did not participate in this specific study) and to other contexts (including settings, stimuli and measures other than those of this specific study). Therefore, and this can be tricky at times, homogeneity with reference to participants and contexts challenges experimental findings’ external validity yet helps to ensure appropriate internal validity. As such, defining the right context and finding the right sample is always to be considered as a balance between both validities. The ambition of the present chapter is to provide readers, scholars and researchers in experimental linguistics with a structure based on which decisions concerning participant variability can be made when planning experiments. On the one hand, this chapter focuses on three approaches to handle participant variability: reduction, systematic variation and registration. The notion of reduction aims to limit external sources of variability. Systematic variation aims to realise sub-samples with reference to relevant confounding variables, to either balance (stratified randomisation) or to investigate their influence by including them as an additional factor. And finally, registration aims to assess potential confounding variables and to eventually control for their influence in the data analysis (i.e., as covariates). On the other hand, we sort participant variables that are particularly relevant for experimental linguistics into two main categories: momentary states and enduring personal characteristics. This differentiation is based on participant variables’ temporal stability, which influences the way variables are handled. Variables that can easily change (i.e., momentary state) offer opportunities for homogenising. Homogenising refers to variation-reducing interventions before and while running the experiment. This is not – or at least to a much lesser extent – a viable option for enduring personal characteristics. Within the category of enduring personal characteristics, we consider demographics a relevant subgroup of variables, because researchers are always expected to specify all relevant demographic characteristics of their participants in the respective methods subsection of their reports (American Psychological Association [Washington, District of Columbia], 2020, chapter 3). Which characteristics are ultimately deemed relevant depends on the topic of the study at hand but usually includes age, gender, education level and (native) language. Depending on the research method, ­ hearing ability it is also expected that information about participants’ handedness, vision and/or will be included in the sample description. Demographics are hence a special type of participant variables as researchers in experimental linguistics are expected to comment on them according to scientific standards. In contrast, whether other participant variables such as motives, skills or competencies gain relevance depends primarily on the specific research topic and research design. Each of the approaches to handle participant variability (i.e., reducing variability, systematically addressing variability and registering variability) is presented and discussed separately for each category of participant variables (i.e., momentary states and stable characteristics). To facilitate the applicability of what is presented, a particularly well-suited experimental psycholinguistics example is used for illustration, namely the study by Sato, Gygax and Gabriel (2013)­2. In this study, a sentence evaluation paradigm was used to investigate the effect of gender information on the comprehension of human-referent role nouns among English-French and French-English bilinguals. In Sato et al.’s two-choice task (i.e., yes vs. no), participants had to decide, as accurately and as quickly as possible, whether a second sentence would be a sensible continuation of a first sentence. For example, participants were presented with the target sentence ‘At the end of ­ the day the majority of the men seemed to want to go home.’ following a prime sentence ‘The social workers were walking through the station.’ Participants underwent the experiment in French (grammatically gender marked on the role nouns) and in English (grammatically gender unmarked 347

Ute Gabriel and Pascal Gygax

on the role nouns). The gender information provided in the sentences were the manipulated variables and participants’ second-language proficiency was measured. The proportion of yes-responses ​­ and positive response times were the dependent variables.

22.2

Reduce participant variability

If extraneous variation – that has nothing to do with the actual experiment – could become a challenge, a first approach is to seek to avoid it in advance. One way to do this is to select a group of participants who are as homogeneous as possible and collect data from them in a highly controlled and standardised way. With reference to potential variability in participants’ momentary states (e.g., emotion, fatigue, …), measures can be taken to modify these states towards a homogeneous, usually attentive, emotionally neutral one. This is important to ensure that all participants have the same opportunities to carry out the tasks expected of them. What this means in concrete terms depends on the particular experiment, but it usually includes at least comprehensible and complete instructions, a shielded environment with low noise levels, sufficient lighting, comfortable seating and positioning. Depending on the difficulty of the experimental task, one might consider using practice trials to familiarise participants with the task, depending on the length of the experiment and/or the characteristics of the tasks, one might consider planning breaks or other measures to avoid participant fatigue. Particular attention also needs to be allocated to ensure that the different steps of an experiment (i.e., the procedure) are standardised as much as possible. By doing this, we can avoid variation introduced by following unsystematic procedures. Put differently, all procedures need to be formalised and employed in a reproducible way. Usually, information on how the standardisation was done can be found in the apparatus and procedure sections of a published article. In our example – the experiment by Sato, Gygax and Gabriel (2013) – it is stated in the procedure section (p. 798) that ‘each participant was tested individually in a small quiet room’ and provided with a standardised instruction. Also, in the procedure section we find the information that each trial was started with a prompt, and that participants were instructed to keep their fingers on the respective response buttons. These are common procedures to reduce irrelevant variation: for example, a prompt seeks to ensure that participants pay attention and are looking at the screen when required to. In Sato et al. (2013), the prompt was the word ‘ready?’, but it could be any version of a fixation point (e.g., ‘****’). Keeping one’s fingers ready on the response buttons facilitates the physical response, which in turn is further facilitated by selecting a response set up with a high-stimulus response compatibility (e.g., Hommel, 2000). Both ways of ensuring systematicity (i.e., fixation point and predetermined finger position) help to ensure that differences between the participants (in their attention orientation and in finding the relevant buttons) are not (or only minimally) reflected in their response times. Interestingly, the apparatus section in Sato, Gygax and Gabriel (2013, p. 798) states that different computers were used for their French-English and their English-French sample, but that the experiment was controlled by the same software and response button box (i.e., a box with different keys that participants use to give their response). As a general principle, researchers should strive to avoid variations in apparatus, settings and procedures, and in cases where this cannot be avoided, it needs to be reported, such that it can be taken into consideration when evaluating the stability of the results. In that respect, whether and what variations will be considered acceptable will very much depend on the kind of data that have been collected, and features of the sample that the data have been collected from. For example, experiments where effect sizes are in the ranges of milliseconds need to be very highly controlled, whereas one could be a little more lenient for experiments with effect sizes in the ranges of seconds. Still, all aspects that contribute to 348

Dealing with participant variability in experimental linguistics

homogenising participants’ state of mind need to be taken into consideration when designing the experiment and when adapting it to a target population. With reference to potential variability introduced by enduring person characteristics, these can be restricted by defining criteria as to who is included in the sample (i.e., selection and deselection criteria). Regarding demographic variables and going back to the example of age being associated with reaction times, a researcher might consider including only a specific age group (i.e., an age homogeneous sample). An example for other, broader variables, might be the participants’ experience with tasks that are similar to the experimental tasks. Research suggests, for example, that video gamers have faster reaction times than non-gamers (e.g., Kowal et al., 2018), and researchers might hence consider including only participants with a pre-defined experience of the task at hand. Aspects that contribute to homogenising the sample with reference to person characteristics need to be taken into consideration when defining the target population. In research articles, information on this is typically provided in the participants section. For example, in Sato et al. (2013) on p. 796, it is stated that students enrolled in two specific universities and with specific language characteristics were included in the sample. The latter criterion is interesting for experimental linguistics, as it refers to the use of language proficiency as a selection or de-selection criterion. In Sato et al. (2013), bilinguals were targeted. A bilingual person is generally described as someone who speaks two or more languages (e.g. The Linguistic Society of America; https:// www.linguisticsociety.org/resource/faq-what-bilingualism), which leaves room for discussion on ­ ­­ ­​­­ ​­ what ‘speaking two languages’ entails concerning fluency, proficiency and language symmetry. Researchers must, therefore, make an operationalisable limitation that reflects the target group relevant to their specific research questions, such as: In the present study, we operationalised bilingual proficiency levels in terms of an objective evaluation criteria assessed by C-test performance, which has been shown to measure comprehensive language competence, and found that the linguistic competence measured by C-test scores was a good predictor of the influences of language onto gender representation. (Sato et al., 2013, p. 803) Emphasising the target population’s demarcation (and reducing participant variability), the Ctest score was used to remove six participants (i.e., ‘from the analyses as their L2 proficiency was too low (less than a third correct on the C-test’, p. 796). In addition, their materials (i.e., the target nouns) had been piloted to ensure that the intended sample could be expected to have the vocabulary knowledge necessary to perform the task (FN#3 in Sato et al., 2013). As tempting as it might be to uniformly collect data from highly similar participants in close to identical settings, it should not be forgotten that such a strategy also brings disadvantages. Depending on how settings and procedures are being standardised, the criticism of artificiality could arise. For example, although university students, often used in experimental linguistics as participants, may well constitute a homogeneous sample, whether they are true representatives of “real” people is always questionable. Similarly, the question of whether observations made in restricted experimental settings (i.e., reading experimental materials) bear any relevance for real-life behaviour is also very important (i.e., reading a book at home). Whether such a critique is relevant crucially depends on the type of research question to be investigated (e.g., Lin et al., 2021). When selecting samples from homogeneous populations, an issue arises, as selecting quite particular participants may consequently limit the external generalisability of the results. For example, in psychology, introductory students are frequently recruited as research participants (among others as they represent a rather homogeneous and easily accessible pool), and particularly for studies that first want to check whether a phenomenon can be found at all (i.e., for which questions of generalisability across populations is not yet in the fore). 349

Ute Gabriel and Pascal Gygax

This focus on student samples has been criticised and has received the name of WEIRD (i.e., participants that represent the western, educated, industrialised, rich, democratic societies, e.g., Arnett, 2008; Henrich et al., 2010). However, in experimental psychology, WEIRD populations have long been regarded as unproblematic when investigating basic aspects of cognition that researchers (more or less implicitly) assume to be shared across all humans. In their comprehensive work, Henrich, Heine and Norenzayan (2010) question this assumption, and present evidence that some presumably fundamental psychological processes vary considerably across populations and that from a human species perspective the WEIRD population is rather an outlier than a representative population. Therefore, during the research process – even if not at the very beginning – it becomes necessary to address the question of generalisability. For experimental linguistics, the notion of seeking comparative data from diverse populations might be especially challenging due to interindividual differences in language exposure and experience within and across speaker populations (see Kidd, Donnelly & Christiansen, 2018, for a further discussion). On the one hand, differences in individual language experiences between populations of speakers of the same language could be important for the experimental task at hand. For example, Tskhovrebova et al. (2022) measured how much participants were exposed to print (using the Author Recognition Test for French-Speaking) and found this to be related to how proficient they were in terms of grammatical competence. On the other hand, differences between language systems and cultures (for cross-linguistic experiments) make it difficult to ensure that an experimental task has indeed the same meaning for all participants, or for the same participants in different languages. For example, Costa et al. (2014) found that when presenting a moral judgement task (the Footbridge Dilemma), participants more likely made utilitarian decisions when it was presented in their L2 than when presented in their L1. In sum, to reduce participant variability, a standardised experimental procedure is needed that has been carefully adapted to the target sample, drawn from a reflectively selected population. While ensuring internal validity, it is important to weigh up advantages of gaining statistical power against the disadvantages of limiting external validity.

22.3

Systematically address participant variability (a­ priori control)

For variables which are considered relevant sources of participant variation and for which reducing their variation might either not be sufficient (e.g., practice and fatigue effects in longer experiments) or not an option at all, researchers might consider controlling the influences of these variables a priori, either by extending the experimental design by adding those variables as experimental or quasi-experimental factors or by taking these variables into account when assigning participants to experimental conditions by performing a stratified randomisation. Again, as mentioned earlier, the momentary state of participants changes over the course of an experiment. Participants are typically getting used to the setting, they are becoming more experienced with the set up and the task, and they might eventually become tired or bored. For example, when same (or sequences trials – are repeated, participants ­ similar) stimulus-response ­ ​­ ­   – called ​­ become more proficient at responding (Sanders, 1998). However, if the task is complex, it can over time also lead to mental fatigue (Langner et al., 2010). If an experiment includes several stimulus conditions, it therefore needs to be ensured that practice and/or fatigue effects are not systematically confounded with these conditions. One possibility is to randomise for each participant the order in which the trials are being performed. This is a randomisation per participant. However, this is not always useful and/or sufficient. In Sato et al. (2013), for example, participants were bilingual (English and French) 350

Dealing with participant variability in experimental linguistics

and performed the experimental task in both languages. Randomising the order of English and French trials, i.e., forcing participants to frequent code-switches, could be rather exhausting and would not necessarily serve the research goal of the experiment. The authors (p. 797), therefore, combined all English trials into a block, and all French trials into another one. Within each block, the order of the trials was randomised per participant, however, between the blocks the order was systematically varied across participants. Half of the participants started with the English block and ended with the French one, whereas the other half started with the French block and ended with the English one. This is a common approach in experimental linguistics: trials are grouped into blocks, and the order of the blocks in the experiment is systematically counter-balanced, leaving only the order of trials within a block to chance. Design-wise, block order then becomes an additional experimental factor that needs to be checked. With reference to person characteristics, a random assignment of participants to experimental conditions can be used to create equivalent groups; yet, equivalency is not always guaranteed this way. For example, assigning participants randomly to block order could still result in most left-handed participants being assigned to the condition starting with Block A and only a few left-handed participants being assigned to the condition starting with Block B. If researchers had the justified assumption that handedness could matter, the assignment could be improved by first splitting participants into groups according to their hand-dominance (i.e., left-handed, right-handed and ambidextrous individuals), and then randomly assigning the members of each subgroup – called strata – to the block order condition. This is called stratified randomisation. In other words, participants are being randomly assigned to experimental conditions split by their subgroup membership(s). As such information is often not available before the participant arrives in the lab, pseudo random assignment should be used to equally distribute participants across conditions. For example, while the first left-handed participant is randomly assigned to one of the experimental conditions, the second left-handed participant is only randomly assigned to one of the remaining conditions, and so on. Once one left-handed participant has been assigned to each experimental condition, the procedure starts anew. The same is done for participants who are right-handed, and for those who are ambidextrous. As a result of this approach, the groups in each condition are similar in composition in terms of handedness. Something similar was done in Sato et al. (2013), but accounting for participants’ first (and second) language. Namely, not only was the order in which participants did the English and French blocks manipulated – as mentioned above – but also whether participants started with their first or their second language. This was described as follows (p. 797): Each participant read half of the experimental role nouns in French and the other half in English. […] Half of the participants began the judgement task in English and eventually switched to French, while the other half began with French and eventually switched to English; in other words, half of each group began the task in their L1 while the other half began with their L2, counterbalancing a possible effect of language dominance upon which the task began with. (Sato et al., 2013, p. 797). To ensure that, on the one hand, no participant sees the same experimental role noun in both languages, but that, on the other hand, each experimental role noun – across all participants – is shown in each of the two languages and in each of the continuation conditions, the authors created different lists: we constructed a total of four lists (two in each language) to ensure that each role noun was equally followed by a “men” or a “women” continuation in each language. If in one list a role noun written in French was followed by a male continuation, in another list, it would 351

Ute Gabriel and Pascal Gygax

be followed by a female continuation, and in the two remaining lists, it would be written in English. Each participant read only one list. Creating these four lists allowed us to test participants in both languages without a repeated presentation of each role noun, which may have resulted in some confounding (repetition) effects. (Sato et al., 2013, p. 797) The creation of lists is done pseudo randomly to ensure an equal number of items per experimen­ ​­ tal condition. Design-wise, list then becomes an additional experimental factor that needs to be checked. Techniques to a priori control variations in participants’ state during the experiment do add to the complexity of designing experimental procedures (randomisation and counterbalancing techniques) or the study design (order or list as additionally manipulated factor). Techniques to a priori control variations in participants’ personal characteristics require information to be available when starting the experiment as the information either forms the basis for a stratified randomisation or, if participant variables are added as additional factors, is needed to ensure that enough participants are being recruited. These techniques aim at reducing the likelihood of accidental confounds. Whether variables should or should not become an additional factor in the study design and analysis depends on the variable’s relevance for the research question. In other terms, researchers need to ask themselves whether it is relevant to control for these variables, or whether these variables might themselves modulate the effects to be observed with the experiment. Any added factor also makes the study design more complex, and data collected in complex designs can be challenging to analyse, and results can in the case of higher-order interactions be challenging to interpret. For these reasons, a strategy of systematic variation should be very well justified, methodologically and/or theoretically.

22.4

Register participant variability (a­ posteriori control)

The third approach to deal with participant variation is to take notice of it, and simply register the information with the aim of either using it in the de-selection procedure (with a priori-defined criteria) or including it as a covariate in the data analysis, hence statistically controlling for its influence. Even if one has done a good deal to reduce variation, it can still be useful to monitor participants’ state during and after the data collection. Trial number, for example, can be used to check whether participants’ responses change over time. Once participants have had sufficient practice, we can assume that response times are stable. So, provided that everything else remains constant, we can expect response times to initially get faster – by getting used to the task – and eventually to remain stable. Deviations from that pattern can be an indication that a participant’s ability and/or motivation has changed. In this respect, including questions that check attention and/or comprehension (e.g., Gillioz & Gygax, 2017) can be useful to reveal participant variability, as is keeping a log and conducting a post-experimental survey. Person characteristics that are collected anyway to describe the sample can either be used as covariates, or their direct influence on the DV is tested in pre-analyses, and the variables are included or not in the main analyses as appropriate. In the context of language studies, however, characteristics that capture language-related variation, such as aspects of language competence and familiarity with the study material, are probably more interesting than general demographic characteristics. To this end, Hintz et al. (2020) recently shared a behavioural dataset for studying individual differences in language skills. The dataset consists of the data from 112 adults on 33 352

Dealing with participant variability in experimental linguistics

behavioural tests designed to measure linguistic experience and linguistic processing as well as general cognitive skills. As such, the dataset provides researchers with an excellent opportunity to explore individual differences and thus inform the planning of their own study. In Sato et al. (2013), participants’ language proficiency was also measured, and a C-test was used for it. In addition, a self-evaluation questionnaire was employed to register background information on the participants’ bilingual history. In contrast to the C-test performance, which was used for de-selection of (very low proficiency) participants (reducing variation) as well as in the data analysis (for the other participants), the information collected via the questionnaire was not used in the analyses, but to describe the sample (see also FN #4 in Sato et al., 2013). However, this information could have been used exploratorily a posteriori. It is important to be aware that a posteriori exploratory analyses must be marked as such. Also, if exploratory analyses suggest specific and interesting results, they need to be followed up by confirmatory experiments, designed to evaluate the reliability of the explored findings (e.g., Nicenboim et al., 2018). Good research practices today include pre-registration, i.e., the practice of registering the research plan before the research is conducted (see e.g., the Center for Open Science, http://www.cos.io). A common misconception is that pre-registration does not allow for a posteriori exploratory analysis. This is not the case: pre-registration only requires that exploratory analyses are explicitly presented as exploratory, i.e., a clear separation between hypothesisgenerating (exploratory) and hypothesis-testing (confirmatory) research. As tempting as it might seem, to collect as much data as possible just ‘to be on the safe side’, and make sure all possible extraneous variables have been accounted for, one should not forget that as researchers we are also obligated to keep the burden of participation at a minimum. This means we are to collect as little data as necessary. Collecting data that is not needed does not only unnecessarily demand resources from participants but also from the researcher(s) and/or the research institution(s). ­

22.5

Summarising discussion

In this chapter, we have presented three strategies to handle extraneous variance from participant variability, namely the reduction, the systematic variation and the registration of participant variability. These strategies were presented on the one hand with reference to participants’ momentary states and on the other hand with reference to participants’ enduring demographics and characteristics. The underlying idea of approaching the topic in this way is that strategies and participant variables form a framework to discuss and evaluate any participant variable in a structured and solution-oriented manner. Following this notion, Table 22.1 reproduces the chapter in a highly abbreviated form. Table 22.1 exemplifies each strategy split by whether the source of potential extraneous variability is fluctuating (momentary state) or stable (personal demographics or characteristics). This split is important as the changeability of variation plays a decisive role; in the first case (fluctuating), the experiment can be seen as an intervention that contributes to attenuating (or increasing) variation, whereas in the second case (stable), variation can only be handled. As various sources potentially contribute to participant variation, it should be evident that the strategies are not mutually exclusive (on the contrary, as shown in the example study, we selected for illustrative purposes, Sato et al., 2013). We envision that Table 22.1 could be a quick reference for addressing participant variability in experiment preparation. In a first step, researchers would, of course, need to identify participant variables that have the potential to introduce extraneous variation to their specific study. In a 353

Ute Gabriel and Pascal Gygax ­Table  22.1 Overview of different strategies to handle participant variability Strategy

Reduction Systematic variation Registration

Variable is …

Key issue

Momentary state

Personal characteristic

Formalised and standardised procedures, materials, set­up, etc. Counterbalancing Additional (quasi) ­ experimental factor(s) ­ Register indicators that reflect participant state before, during and after the experiment

Homogeneous sample(s) ­

External validity

Stratified randomisation Additional (quasi) experimental factor(s) ­ Register information that can provide insight into participants’ prerequisites with reference to the materials and the experimental ­set-up ​­

Design complexity Research participation burden

second step, the table could provide a structure to discuss whether and how each of these variables could be addressed. In Table 22.1, as a key issue, we have tried to identify the main challenge for either strategy – this is, of course, not exhaustive (and has been discussed in more detail above) but should send a clear signal as to what compromise would have to be made. In our endeavour to design controlled linguistics experiments with high internal validity and statistical power, we must always balance these efforts with other factors, such as questions of external relevance and validity (generalisability), the appropriateness of the complexity of the design, the burden on participants or the economics of research. How these factors are weighed against each other depends on the specific objective of the experiment in question (Mook, 1983; Lin et al., 2021).

Notes 1 Here, by “reaction” we refer to a single reaction to a single stimulus (e.g., pressing a button when you see a green light). Now, following Hick’s law (Hick, 1952), we can consider that when a choice of different responses is available (e.g., yes or no for a judgment task), the time to respond is the sum of the reaction time and the cognitive and motor processes to choose the appropriate response. As such, when there are different choice responses, as is often the case in experimental linguistics, we refer to response times, and not just reaction times. 2 The accepted version of the published manuscript can be downloaded from the open repository at the University of Fribourg: https://folia.unifr.ch/unifr/documents/320729 ­ ­ ­ ­

Further reading Gillioz, C., & Zufferey, S. (2021). Introduction to experimental linguistics. Wiley Myers, A., & Hansen, C. H. (2016). Experimental psychology, 7th edition. Cengage Learning Asia Pte Ltd.

Related topics New directions in statistical analysis for experimental linguistics; historical perspectives on the use of experimental methods in linguistics; contrasting online and offline measures: examples from experimental research on linguistic relativity; assessing adult linguistic competence

354

Dealing with participant variability in experimental linguistics

References American Psychological Association (Washington, District of Columbia) (Ed.). (2020). Publication manual of the American Psychological Association (7th ed.). American Psychological Association. Arnett, J. J. (2008). The neglected 95%: why American psychology needs to become less American. The American Psychologist, 63(7), doi:10.1037/0003–066X.63.7.602. ­ ­602–614. ​­ ­­ ​­ Campbell, D. T. (1957). Factors relevant to the validity of experiments in social settings. Psychological Bulletin, 54(4), doi:10.1037/h0040950. ­ 297–312. ­ ​­ ­ Campbell, D. T., & Stanley, J. C. (1963). Experimental and quasi-experimental designs for research. Rand McNally & Company. Costa, A., Foucart, A., Hayakawa, S., Aparici, M., Apesteguia, J., Heafner, J., & Keysar, B. (2014). Your morals depend on language. PLOS ONE, 9(4), ­ e94842. doi:10.1371/journal.pone.0094842 ­ Der, G., & Deary, I. J. (2006). Age and sex differences in reaction time in adulthood: Results from the United Kingdom Health and Lifestyle Survey. Psychology and Aging, 21(1), doi:10.1037/0882–7974.21.1.62. ­ ­62–73. ​­ ­­ ​­ Gillioz, C., & Gygax, P. M. (2017). Specificity of emotion inferences as a function of emotional contextual support. Discourse Processes, 54(1), doi:10.1080/0163853X.2015.1095597. ­ ­1–18. ​­ ­ Henrich, J., Heine, S. J., & Norenzayan, A. (2010). The weirdest people in the world?, The Behavioral and Brain Sciences, 33(2–3), discussion ­83–135. doi:10.1017/S0140525X0999152X. ­­ ​­ ­61–83; ​­ ​­ ­ Hick, W. E. (1952). On the rate of gain of information. Quarterly Journal of Experimental Psychology, 4(1), ­ doi:10.1080/17470215208416600. ­11–26. ​­ ­ Hintz, F., Dijkhuis, M., van’t Hoff, V., McQueen, J. M., & Meyer, A. S. (2020). A behavioural dataset for studying individual differences in language skills. Scientific Data, 7(1), ­ 429. doi:10.1038/s41597-020-00758-x. ­­ ­​­­ ­​­­ ​­ Hommel, B. (2000). The prepared reflex: Automaticity and control in stimulus-response translation. In S. Monsel & J. Driver (Eds.), Control of cognitive processes: Attention and performance (pp. ­­  ­247–273). ​­ MIT Press. Kidd, E., Donnelly, S., & Christiansen, M. H. (2018). Individual differences in language acquisition and processing. Trends in Cognitive Sciences, 22(2), doi: 10.1016/j.tics.2017.11.006. ­ ­154–169. ​­ ­ Kowal, M., Toth, A. J., Exton, C., & Campbell, M. J. (2018). Different cognitive abilities displayed by action video gamers and ­non-gamers. Computers in Human Behavior, 88, ­255–262. doi:10.1016/j.chb.2018. ​­ ​­ ­ 07.010. Langner, R., Steinborn, M. B., Chatterjee, A., Sturm, W., & Willmes, K. (2010). Mental fatigue and temporal preparation in simple reaction-time performance. Acta Psychologica, 133(1), doi:10.1016/j. ­ ­64–72. ​­ ­ actpsy.2009.10.001. Lin, H., Werner, K. M., & Inzlicht, M. (2021). Promises and perils of experimentation: The mutual-internaldoi:10.1177/1745691620974773. ­validity problem. Perspectives on Psychological Science, 16(4), ­ ­854–863. ​­ ­ Mook, D. G. (1983). In defense of external invalidity. American Psychologist, 38(4), ­ ­379–387. ​­ doi:10.1037/0003–066X.38.4.379. ­­ ​­ Nicenboim, B., Vasishth, S., Engelmann, F., & Suckow, K. (2018). Exploratory and confirmatory analyses in sentence processing: A case study of number interference in German. Cognitive Science, 42(S4), ­ 1075_1100. doi: 10.1111/cogs.12589. Sanders, A. F. (1998). Elements of human performance: Reaction processes and attention in human skill. Psychology Press. Sato, S., Gygax, P. M., & Gabriel, U. (2013). Gender inferences: Grammatical features and their impact on the representation of gender in bilinguals. Bilingualism: Language and Cognition, 16(04), 792–807. ­ ­ ​­ doi:10.1017/S1366728912000739. ­ Tskhovrebova, E., Zufferey, S., & Gygax, P. (2022). Individual variations in the mastery of discourse connectives from teenage years to adulthood. Language Learning, 72(2), doi: 10.1111/lang.12481 ­ ­412–455. ​­ ­ Tun, P. A., & Lachman, M. E. (2008). Age differences in reaction time and attention in a national telephone sample of adults: education, sex, and task complexity matter. Developmental Psychology, 44(5), ­ 1421–1429. ­ ​­ doi:10.1037/a0012845. ­ Welford, A. T. (1988). Reaction time, speed of performance, and age. Annals of the New York Academy of Sciences, 515(1), doi:10.1111/j.1749–6632.1988.tb32958.x. ­ ­1–17. ​­ ­ ­ ​­

355

23 TESTING IN THE LAB AND TESTING THROUGH THE WEB Jonathan D. Kim

23.1

Introduction and background

Research as a tradition has strong historic roots in laboratory settings. This long history often leads us to believe that experiments must be conducted in the laboratory. The majority of experimental paradigms have only ever been used in laboratory settings, so the clear path of least resistance is to continue this tradition. However, the path of least resistance is not necessarily the path of best practice. Research is a living thing, and as researchers we should always stand ready to embrace new approaches that prove reliable. Indeed, the specifics of what laboratory experimentation involves has shifted drastically over the years. While a large amount of research is now computer-mediated, the use of computers as an experimental aid was a revolution that only started in the 1970s (Musch & Reips, 2000). The computer revolution provided researchers with far greater adaptivity, interactivity, ease of data storage, and experimental control (Musch & Reips, 2000). This was followed in the mid-1990s with the development of email, message services, and fillable web forms that suddenly allowed for experimentation to occur via the web (Musch & Reips, 2000; Reips, 2002a), overcoming many of the issues that underpin lab-based experiments. The development of fillable web forms was especially important, as it allowed for participants to provide anonymous responses to researchers’ queries. Over time, more and more experimental avenues opened up. The development of Java, a program that allowed small programs (including experiments) to be embedded in web pages, opened the door to more complex experimentation (e.g., Buchanan & Smith, 1999; Musch & Reips, 2000; Smith & Leigh, 1997), while the development of video conference technology allowed for digital interviews and focus groups. Early web-based testing often suffered from issues such as response time noise (e.g., Eichstaedt, 2001), which has negatively coloured the perception of many in relation to the usability of web-based testing. However, recent research has found that web-based testing has largely overcome many of these problems, with modern instruments often producing results in line with those produced in laboratory settings (e.g., Dandurand et al., 2008; Gosling et al., 2004; Kim et al., 2019). These advances are not only in terms of instruments and technology: more and more paradigms are positively tested in (or even designed for) web-based instruments, making web-based testing more and more approachable for us as researchers. When deciding upon whether to use lab- or web-based testing, it is important to remember that lab- and web-based testing are significantly different from each other, but are not completely DOI: 10.4324/9781003392972-26 356

Testing in the lab and testing through the web

separate. The two most common misconceptions in relation to this topic are that either (1) lab- and experimentation are completely different with no overlap, and (2) lab- and web-based ­web-based ​­ experimentation completely overlap with no variation (Reips, 2002a). The truth is in the middle. There are areas in which the two modalities completely overlap, and there are areas in which they completely differ, with many areas in between that show differing degrees of variation. For example, web-based experimentation must be conducted via computerised means while lab-based ­ ​­ experimentation can use alternative approaches (e.g., surveys, in-person interviews), but modern experimentation is largely computer-based, so many experiments will be conducted via computer regardless of whether it is a lab- or web-based study. Of note, there are a number of different terms that are used interchangeably with web-based experiment: internet experiment, web experiment, world wide web (or WWW) experiment, or online experiment (Reips, 2002a). ‘Web-based’ is used in this chapter as it allows for far easier comparison with ‘lab-based’. Importantly, care must be taken to avoid using ‘online’ (and, similarly, to avoid referring to lab-based experiments as being ‘offline’) due to long-standing definitions of what offline and online means for human experimentation. As well as minimising the opportunity for confusion, maintaining this separation allows us to better define the experimental paradigm used in a given study as both online and offline experiments can be run in the lab or via the web. Taken together, developments in web-based approaches to research mean that we can set aside the belief that lab-based testing is inherently more appropriate than web-based testing, and instead focus on determining which approach best fits the specific study we wish to undertake. In order to do so, we must first understand the core concepts underpinning this choice.

23.2

Core concepts

The aim of this chapter is to expand on the practical considerations for lab- and web-based testing, to provide readers with a reasonable knowledge base that allows them to make an informed decision on which approach is more applicable for a given experiment. To properly do so, one must first understand a number of core concepts. These are cognitive load, ecological validity, factors, experimental control, generalisability, instrument accuracy and pre­equipment-related ​­ cision, resource constraints, and voluntariness. Following their definition in this section, their relevance for lab- and web-based studies will be discussed in the sections below.

23.2.1

Cognitive load

Cognitive load refers to the amount of mental (i.e., cognitive) resources required, out of a limited pool, to fulfil the requirements of mentally demanding tasks (King & Bruner, 2000). This theory holds that the more accurately information we are receiving meets our existing beliefs and knowledge structures, the less cognitive resources will be required. This is often most easily observable in participants’ response times, with information that is in line with our existing beliefs requiring significantly less time to process than information that goes against these beliefs.

23.2.2

Ecological validity

Ecological validity refers to the level to which a participant’s behaviour in an experiment resembles their behaviour in a naturalistic setting (Kim et al., 2019). The closer to reality an experiment can be, the higher the level of ecological validity the experiment is said to have, and the more we 357

Jonathan D. Kim

can be confident that the results obtained reflect the participant’s real-world behaviours. Reips (2002a) holds that ecological validity increases with increased familiarity and decreased cognitive load. As such, the more familiar the environment a participant is in when undertaking an experiment, and the less cognitive resources a specific experiment takes, the higher the ecological validity of the experiment. Further, the more ecologically valid a research environment is, the greater the real-world generalisability of the results (Speed et al., 2018).

23.2.3 ­Equipment-related ​­ factors Equipment-related factors are important as the specific instrument used for data collection, hardware timing features, device driver issues and interactions, script errors, operating system variability, interactions with other software, tools to construct the paradigm, interactions with other hardware, screen refresh rate, and configuration of settings and levels can all greatly influence data recording (Krantz & Dalal, 2000; Plant, 2009). There are two main equipment-related factors to be considered: availability and accuracy. Availability refers to whether equipment exists at all that will allow you to undertake an experiment in a given modality, while accuracy refers to experimental limitations in a given piece of equipment. For example, it is possible to conduct eye-tracking experiments through both lab- and web-based instruments, but the ability to use more highly advanced, highly accurate, equipment in the lab means that it is far easier to determine very delicate and fast movements such as micro-saccades that web-based, less accurate, experimentation does not, as of yet, allow.

23.2.4

Experimental controls

Experimental control refers to the ability for us as experimenters to minimise the possibility of extraneous factors, experiences, and environmental variables from impacting upon the results obtained in a given study (Kidd & Morgan, 2010). The more facets of the research environment that are kept identical for all participants the greater the level of experimental control in that experiment. This is important as higher experimental control makes it easier to examine correlation and causality by removing or mitigating the noise of extraneous factors.

23.2.5

Generalisability

Generalisability is used here to refer to the human population – that is, the level to which a given experimental approach allows you to recruit from broad demographic, geographic, and/or cultural backgrounds, in turn allowing you to be able to generalise results more broadly across a population. Importantly, when planning intercultural research, it is important to note that different languages and/or cultures examined presents unique challenges that researchers must address, with the exact requirements and procedures followed potentially differing greatly between languages and cultures based on factors such as specific cultural or linguistic nuances, field site logistics, population factors, language documentation, research focus, and many other factors (Speed et al., 2018). Further, as Speed et al. (2018) point out, the majority of linguistic and psycholinguistic research examines European languages and uses WEIRD (Western, educated, industrialised, rich, democratic) monolingual participants, which is at odds with the majority of humans being non-WEIRD multilinguals who are native speakers of non-European languages. As such, for both lab- and web-based experiments, research across diverse populations must follow proper intercultural research processes and be designed in a manner that avoids potential harm to the populations that participate in the research. 358

Testing in the lab and testing through the web

23.2.6

Instrument accuracy and precision

A factor for all experimental instruments (i.e., the experimental tools that experiments are run through) is accuracy and precision. Every instrument has at least some degree of response noise, and this level differs between instruments (Calcagnotto et al., 2021). Instrument accuracy refers to the closeness of measurements to the ‘true’ value, while precision refers to how close measurements of the same stimuli are to each other (Dunn, 2022). Even if participants were to produce no response errors themselves, all computer-based experiments are susceptible to response noise from hardware and software. This can lead to inaccurate data and, in extremis, inaccurate conclusions, potentially undermining replication via contributing false positive and negative findings (Plant, 2016). It is, therefore, imperative during instrument selection that you choose one that has a level of accuracy and precision in line with the requirements of your experiment. Importantly, this holds both within and between modalities. All instruments have differing levels of accuracy and precision, so care must be taken to select the instrument that best suits the experiment’s needs. Importantly, when all else remains the same, instrument precision is relatively systematic. So, if the precision of a given instrument is such that all response times are recorded as occurring 50 ms slower than they truly did, then the mean differences between factor levels would be identical to if there were no response time noise, even if the mean values of the factor levels are each 50 ms slower than what is true. Finally, different kinds of data are affected by these errors to different degrees; for example, response accuracy (i.e., the degree to which participants input correct responses) is relatively stable regardless of response noise, while response time (i.e., the amount of time it takes for participants to input responses) is highly susceptible to response noise.

23.2.7

Resource constraints

The term ‘resource constraints’ refers to a broad spectrum of constraints brought about by a myriad of different types of cost that every experiment faces: researcher working hours, maintenance costs, space-use costs, opportunity costs, and, of course, monetary costs. Every experiment is constrained by each of these costs to at least some degree, and it can be very beneficial to determine the resources experimenters have available in each of these domains to help them to better determine what can be achieved with the resources available.

23.2.8

Voluntariness

Voluntariness refers to the degree to which participants in a study (A) give informed consent to participate in a study, and (B) complete the entire experiment without withdrawing, and without being controlled or affected by the researchers (or, indeed, any other individuals) influence (e.g., Beauchamp & Childress, 2009). Increased voluntariness is beneficial as the likelihood for participants to give authentic responses similarly increases, with participants feeling more in control of their answers, and therefore more comfortable in responding how they truly feel (Reips, 2002a).

23.3

Lab-based experiments

This section presents the benefits and drawbacks of lab-based experimentation, with emphasis on the core concepts outlined in Section 23.2. For ease of reference, these are briefly stated in ­Table  23.1. 359

Jonathan D. Kim ­Table  23.1 Benefits and drawbacks of lab-based experimentation Advantages of lab-based ­ ​­ experiments

Drawbacks of lab-based ­ ​­ experiments

Ability to use more accurate and advanced equipment Higher experimental control Higher number of suitable experimental paradigms Higher instrument precision Lower paradigm pre-testing ­ ​­ requirements Ability to give reactive briefings and debriefings

Tendency towards reduced ecological validity Increased experimenter effects Tendency towards reduced generalisability Reduced voluntariness

23.3.1  Benefits 23.3.1.1 ­Equipment-related ​­ factors While web-based instruments are slowly expanding their capabilities, it is still often the case that if an experiment requires specialised equipment, be it hardware (e.g., highly accurate equipment to measure eye movements, heart rate, breathing rate, galvanic skin response), environmental equipment (e.g., a lab space designed as a comfortable meeting room with multiple cameras and microphones, to allow for clear video and audio of both interviews and focus groups), or even specific personnel (e.g., only one researcher is trained in a specific method of experimentation, and they are required to be physically present during the experiment) then the experiment must be lab-based. While some web-based alternatives exist for certain equipment, such as using participants’ webcams to record eye movement data (e.g., Semmelmann & Weigelt, 2018), these alternatives often have increased noise as well as reduced accuracy and precision. This is important to note, as high noise levels can obscure small effects and give the illusion of heterogeneous responses. For example, for an experimental linguistics experiment intending to examine eye movements between specific word parts, the increased noise from a webcam might even make it impossible to confidently identify which word a participant was focusing on at a given time. As such, if your experiment requires the use of specialised equipment, lab-based testing is likely to be the better option.

23.3.1.2

Experimental control

Experimental control is normally much higher in lab-based studies due to the ability to have all participants undertake the study in exactly the same physical location, under exactly the same environmental conditions, and with exactly the same equipment used. It is far harder to ensure that experiments follow these guidelines when people are undertaking them in their own homes, as real-world environments are potentially noisy and filled with distracting stimuli (Speed et al., 2018). This comes with the trade-off that lab-based experiments tend to have lower levels of ecological validity, as the strict controls in the lab are far less likely to mimic naturalistic conditions. If your experiment is very delicate to external factors, higher levels of experimental control are likely to be greatly beneficial, suggesting that lab-based experimentation should be used.

23.3.1.3

Higher number of suitable experimental paradigms

Due to the long history of experiments prior to the development of the internet, and the relatively young nature of reliable web-based testing, there are a large number of experimental paradigms 360

Testing in the lab and testing through the web

that (as of yet) have only been tested in lab settings. These paradigms represent a wealth of proven experimental approaches that require very little pre-testing. Further, there are some research paradigms that are technically possible using web-based approaches, but suffer significant drawbacks from doing so. For example, web-based interview tasks and focus groups conducted digitally face a number of issues due to latency (discussed below). Finally, lab-based studies are not limited to those for which a computer must be used. This means that if you wish to use an established experimental paradigm that has not been tested for, or indeed can’t be used in, web-based experiments, then lab-based testing is likely to be the better option. That said, over time more and more experimental paradigms are being tested for use with web-based instruments, and certain experimental approaches have become largely digital, such as corpora increasingly being stored digitally to allow for constant global access.

23.3.1.4

Instrument precision

While individual instruments have differing levels of accuracy and precision, even to the point that some web-based instruments perform better than some lab-based instruments, lab-based instruments have one solid benefit compared to web-based instruments. As mentioned above, the precision of a given instrument is relatively stable when all else stays the same. The increased experimental control of lab-based computerised experimentation, with all participants undertaking the experiment on the exact same hardware and with only the software required for testing active, means that response noise from hardware and software remains constant between participants (e.g., Krantz & Dalal, 2000; Plant, 2009). This is not the case for web-based experiments, made worse by limited ability to collect hardware and software information. As such, when conducting delicate research that may be damaged due to reduced precision, then lab-based testing is likely to be the better option.

23.3.1.5 

Lower paradigm pre-testing ­ ​­ requirements

Related to the higher number of suitable experimental paradigms, there is a lower requirement for pre-testing of research paradigms for lab-based experiments. What pre-testing there is tends to be in relation to the specific material you wish to use, rather than proving that a given paradigm is utilisable at all within a given medium. As such, if you have limited capacity for pre-testing, labbased testing may be the better option. That said, it is possible that the reduction in resource costs associated with web-based testing would be great enough to mitigate (or, indeed, be even greater than) any cost associated with pre-testing.

23.3.1.6

Reactive briefing and debriefing

One of the core benefits of lab-based testing is the ability to reactively brief and debrief participants. This is when the researcher interacts with the participant in order to answer any questions, and solve any problems that arise during the briefing and debriefing stages. This is especially important in experiments where deception has been used, as participants are likely to have questions that address the nature of the deception. This is exacerbated in cross-linguistic studies, where meaning can be lost through translation (e.g., Chereni et al., 2020). As such, for experiments where it is essential that participants are accurately briefed and debriefed, lab-based experimentation is likely to be beneficial. 361

Jonathan D. Kim

23.3.2 23.3.2.1

Drawbacks

Ecological validity

Research conducted in the lab is likely to have reduced ecological validity when compared to web-based studies. This is due to a reduction in the level to which the space in which participants undertake the experiment mimics participants’ everyday life. Importantly, while this is generally the case, there are experimental paradigms where lab-based testing might have equal, or even better, ecological validity than web-based testing. For example, experiments for which participants are interviewed individually or collectively (e.g., ethnographic studies, focus group studies, lab-based observational research), and where a researcher must be present, are likely to have higher ecological validity if conducted in the lab. This is especially true if the lab has a meeting room tailored to make participants feel comfortable or if the data collection occurs in the workplace, school, or home of the participant(s). The main reason that these studies tend to have higher ecological validity when conducted in-person compared to via the web is, as mentioned above, due to issues with latency. Latency web-based latency (i.e., the amount of time it takes for the visual and audio information to be encoded, sent, decoded, and presented to the person/people the communicating is with) interfering with turn-taking behaviour (e.g., O’Malley et al., 1996; Seuren et al., 2020). Research has found that human communication in copresent interaction (i.e., face-to-face [lab-based], video-mediated [web-based]) is universally defined by turn-taking as an organisational principle (e.g., Levinson, 2016), with very quick visual cues relied upon to tell us when a speaker has finished talking, and when another in the conversation is about to start talking. During web-based communication, latency can easily add enough delay such that we can easily miss visual cues related to others starting or stopping speaking, leading to much more frequent occurrences of individuals talking over each other via web-based communication than is the case in person (e.g., O’Malley et al., 1996; Seuren et al., 2020). Further, Seuren et al. (2020) found that we accept the latency-laden communication as reflecting reality (i.e., that people are delaying before responding and potentially talking over each other), which induced feelings of frustration.

23.3.2.2

Experimenter effects

Experimenter effects refer to experimental biases that are due to the presence of the experimenter in the room and, therefore, understandably are higher in lab-based studies effects (e.g., Holbrook et al., 2003). This includes, among other things, social desirability bias and observation bias. Social desirability bias is the tendency for some participants to respond in a manner which is socially acceptable rather than reflective of their personal beliefs (e.g., Lavrakas, 2008). This bias is heightened in the presence of others, especially those in positions of authority such as researchers. Observation bias is the tendency for participants to change their behaviour simply due to being observed. This bias is so strong that even static pictures of eyes can affect peoples’ behaviours (e.g., Ekström, 2011). As such, it is good practice to (when possible) run experiments where, after participants have been briefed on the experiment and have given informed consent, the researcher should either have their line of sight obscured from the participant or, preferably, should leave the room in which the participant is undertaking the experiment.

23.3.2.3

Generalisability

Lab-based studies tend to have much lower generalisability than web-based studies, but this is a general, not universal, rule; it is much harder to recruit large and diverse samples with lab-based 362

Testing in the lab and testing through the web

studies, but it is at least technically possible. For example, researchers (e.g., Baker & Hüttner, 2016; Zhang et al., 2022) often undertake research at multiple sites in different geographical locations (e.g., different universities within/across countries, different areas in the same city) as a way to bolster the generalisability of a study. This approach to recruitment would likely prove to be far more resource-intensive, as it requires the researcher to travel with delicate experimental equipment (potentially risking damaging said equipment) between collection locations. A less resource-intensive option would be to undertake research at one university, but purposefully recruit participants from different departments or social groups or faculties. This increases generalisability to some degree, albeit not to the same degree as multi-site samples. Finally, combining these two approaches (multiple recruitment sites, each containing multiple participant pools) is likely to improve generalisability even more but also require even more resources.

23.3.2.4

Voluntariness

As with generalisability, lab-based studies tend to have much lower voluntariness than web-based studies (Reips, 2002a). This again is a general, rather than universal, rule; for experiments in which the researcher must consistently interact with the participant, it is possible that participants might be equally likely to volunteer regardless of whether the experiment occurs in the lab or via the web. This is especially true when web-based testing would require the use of video calling, as there are many who actively dislike web-based communication due aspects such as the latency and non-verbal communication issues discussed above.

23.4 Web-based experiments This section presents the benefits and drawbacks of web-based experimentation, with emphasis on the core concepts outlined in Section 23.2. For ease of reference, these are briefly stated in Table 23.2. Many of the strengths and weaknesses discussed in this section were identified by Reips (2002a), with continued development in the decades since then leading to improvements both in the advantages and the drawbacks of web-based testing. As mentioned above, web-based experimentation relies on computerised equipment. In most cases, this will involve the use of full computer setups (i.e., desktop and/or laptop computers) due to other options (e.g., smartphones, tablets) having hardware and/or software limitations that often make them inadequate for experimental linguistic research (e.g., lack of a physical keyboard). ­Table  23.2 Benefits and drawbacks of web-based experimentation Advantages of ­web-based ​­ experiments

Drawbacks of ­web-based ​­ experiments

Greater ease of accessa Tendency towards greater ecological validity Ability to detect motivational confounding and a ​­ number of ­non-participants Tendency towards greater generalisability and voluntarinessa Increased openness of research processesa Increased public control of ethical processesa Reduction in resource constraintsa

Less ability to use advanced equipment Reduced experimental controla Limited external validitya Issue of multiple submissionsa Issue of increased participant dropouta Reduction in ­participant-researcher interactionsa ​­ Increase in ­self-selection biasa ​­

Note: aAs identified by Reips (2002a)

363

Jonathan D. Kim

23.4.1  Benefits 23.4.1.1

Ease of access

The term ‘ease of access’ refers both to researchers and to participants. For researchers, the use of web-based instruments provides technical ease of access to even the largest, most culturally diverse, research bases. This is because you can avoid time and monetary costs associated with travelling to other cities and countries, and/or having participants travel to you. Further, it allows much easier access to rare and/or specific participant populations, with the proviso that these populations must have (or, at least for the course of the experiment, be provided with) internet access and computerised equipment, and that these participants are computer/tablet/smartphone literate. For participants, the use of web-based instruments provides ease of access as it allows them to undertake experiments at a time and place that entirely suits their needs and desires. Together, this means that everyone involved in the research process benefits to at least some degree from the ease of access associated with web-based studies. As such, experiments that benefit when either, or both, participants and researchers having ease of access are likely to benefit from being designed as web-based studies. However, care should be taken as the requirement for computer literacy can be seen to be a form of selection bias, especially for researchers wishing to undertake work in areas with low computer literacy.

23.4.1.2

Ecological validity

As mentioned above, web-based studies tend to have higher levels of ecological validity. This is because the great majority of web-based studies allow participants to undertake the experiment in settings that are familiar to them, and at a time to suit them, rather than having to schedule a time with the researcher to undertake the experiment in an often unfamiliar environment. The ability to undertake the experiment in a comfortable and familiar environment is held to increase ecological validity due to increasing the level to which the experiment feels naturalistic (e.g., Reips, 2002a). The specific degree to which an experimental task has increased ecological validity depends on the task. For example, experiments where participants are tasked with giving specific responses to specific stimuli (e.g., forced choice tasks, sentence continuation tasks, Likert-scale tasks) are likely to have higher ecological validity if conducted via the web, as these are tasks that can be undertaken while solo and located in comfortable familiar environments, avoiding experimenter effects (e.g., Holbrook et al., 2003) and often reducing cognitive load by preventing split attention effects associated with researcher presence (e.g., Nicholson et al., 2005; Sanders et al., 1978), although not those associated with distracting stimuli in their home environments (e.g., participants passively watching TV or listening to music while participating, or other people in a participant’s household requiring the participant’s attention during the experiment). As such, aside from the counterexamples discussed in the previous section, experiments for which obtaining results as close to naturalistically as possible is beneficial are likely to benefit from being designed as web­based studies.

23.4.1.3

Detection of motivational confounding and number of non-participants

Web-based experimentation allows for the detection of motivational confounding through examining participant drop-out rates, often referred to as drop-out. Drop-out refers to the number of participants who, during the course of the study, withdraw their participation (e.g., Reips, 2002a). 364

Testing in the lab and testing through the web

Motivational confounding refers to when variables confuse, bore, and/or tire participants, causing them to lose motivation (Reips, 2002b). In lab-based studies, participants feel less able to leave the experiment, making it much harder to determine the existence of these confounding variables. However, for web-based studies, confounding information is far more likely to lead to participants withdrawing their participation (Reips 2002a, 2002b). As such, motivational confounding can be examined by determining the exact moment that participants withdrew from an experiment. While this is always possible to measure, the higher rate of drop-out in web-based studies combined with reduced or absent experimenter effects and with the ‘modular ’ nature of many instruments means that it is far easier to statistically determine what element(s) of a given experimental design are causing participants to lose the motivation to complete the experiment, through examining the specific task or sub-task that most frequently led to participants withdrawing. Further, the fact that experimenter effects minimise drop-out in lab-based studies means that lab-based studies risk including results from participants who are far less invested in responding correctly to the experiment but who feel social pressure to remain. This means that pre-testing material for web-based instruments provides additional benefits to researchers seeking to minimise potential problems, meaning that pre-testing using web-based instruments can be beneficial even when you plan to conduct the main experiment in the lab. The number of non-participants can be observed in many web-based instruments through a variety of means. Some web-based platforms, both those that host web-based instruments and those that allow for advertisements to be posted, allow for an indication of the number of people who click a given referral link. Further, some platforms can even provide a measurement of the number of people who saw the advertisement. For example, recent changes to posts in Facebook groups has included the addition of a metric of how many people have seen a post on the bottom left of the post (just above the comments). In combination, this would allow you to compare the number of participants who responded to an advertisement in a specific location with the number of individuals who saw the advertisement in that location. Importantly, these same approaches can also be taken with lab-based experiments if only web-based advertisements are used for recruitment (as doing otherwise would mean that the viewing rate of the non-web-based advertisements would likely not be calculatable), although this would be unusual to do.

23.4.1.4

Generalisability and voluntariness

Web-based experiments tend to have higher levels of generalisability compared to lab-based studies as the requirement for participants to meet in a specific location, and at a specific time, and allows for participants located even tens of thousands of kilometres away to take part. This also makes it far easier to conduct research with neglected populations, such as speakers of diverse languages, as part of cross-cultural research (Speed et al., 2018). Further, the ability for participants to undertake experiments from their homes greatly increases the speed and volume of participants. Together this means that (A) you are able to recruit participants from much further afield, (B) at each collection point you are able to recruit far more participants, and (C) participant recruitment occurs much more quickly. Indeed, under the right conditions, it is possible to have multiple hundreds of participants responding to an experiment a day with web-based experimentation. Reips (2002a) states that, for experiments where thousands of participants are required, web-based experimentation is borderline essential. As such, when conducting research across diverse populations, and especially when (A) large participant numbers and/or quick collection is key and (B) the drawbacks discussed below have been seriously considered, web-based testing may be the best option. 365

Jonathan D. Kim

23.4.1.5

Increased openness of research processes

Openness refers to the level to which the research process is examinable. Lab-based studies may include elements that are not specified in a methods section, while web-based studies, especially if left online after completion of data collection, are able to be easily run through by anyone (e.g., Reips, 2000). Indeed, free instruments such as PsyToolkit (Stoet, 2010; Stoet, 2017) allow for experiments to be uploaded to a public library so that others may examine the experiment not only from the point of view of a participant but also as a researcher examine the code underlying the experiment. This means that web-based research makes designing and conducting replication experiments a far more accurate and streamlined process.

23.4.1.6

Increased public control of ethical processes

Reips (2000) states that the public display of web-based experiments allows for participants, peers, and other interested individuals (including other scientists) to examine an experiment and communicate any objections via email. If following pre-registration processes, this provides researchers with the ability to determine any ethical issues with their experimental design before undertaking the experiment, thus preventing any ethical mishaps.

23.4.1.7

Reduction in resource constraints

Web-based experiments have reduced resource costs across the board, mitigating a lot of resource constraints. They allow for much faster, and much wider, participant recruitment, meaning that the amount of time spent waiting for data to be collected is reduced. This reduction in researcher hours is further improved in many studies by the lack of a need for researchers to directly interact with participants during the course of the experiment. Maintenance costs are reduced as participants are using their home machines, avoiding wear-and-tear to expensive lab equipment. Spaceuse costs are reduced as we do not need to use lab space, which allows for other research that does need lab space to be carried out concurrently, reducing opportunity costs associated with choosing one experiment over another. This is further reduced by the ability to run multiple web-based experiments concurrently, although care should be taken not to undertake too many experiments concurrently, as data analysis is still often a time-consuming process. The reduction in space-use costs can also reduce social friction within a research group, as it allows those whose research must be lab-based to have less competition over timeslots to use lab space. And monetary costs are reduced due to a number of different factors. For example, if participants are responding to an experiment from their home, there is no need to pay transport costs to and from your lab.

23.4.2

Drawbacks

23.4.2.1 ­Equipment-related ​­ factors Equipment-related noise is maximised in web-based experiments due to differences between the software and hardware on different participants computers, and web latency can cause additional response time noise if the experiment is constantly communicating with the server on which participants’ responses are being recorded (Høiland-Jørgensen et al., 2016). As mentioned above, high noise levels can obscure small effects and give the illusion of heterogeneous responses, especially

366

Testing in the lab and testing through the web

for studies examining small effects with fine response time differences. As such, when observing increased heterogeneous responses in web-based studies, care must be taken to ensure that this increase reflects an increase in ecological validity rather than in error noise. Thankfully, as outlined above, more recent web-based instruments have minimised this noise to the point where results are in line with lab-based studies (e.g., Dandurand et al., 2008; Gosling et al., 2004; Kim et al., 2019). Indeed, this research has shown that small effects in the 50 ms to 100 ms range are detectable with modern instruments, although finer effects (e.g., detection of 2 ms to 10 ms differences) may still be too fine to detect. One manner in which this has happened was through entirely loading experiments in participants’ browsers before experimental onset, and sending the entire response set to the server only at the end of the experiment (e.g., PsyToolkit). In this way, error noise that is specifically due to web latency is removed.

23.4.2.2

Experimental control

As mentioned above, web-based experiments have lower experimental control than lab-based experiments. There are a few actions that can be taken to at least minimise this issue. Firstly, the use of within-subject designs, with all participants responding to all experimental items. This allows for the use of participant as a control variable, minimising variance and noise due to by-participant differences. Secondly, the use of recordings of the situational context within which participants are undertaking an experiment (Speed et al., 2018), although doing so incurs a time cost due to this situational data needing to be encoded and examined. Importantly, there are also experimental approaches where this reduction is already mitigated due to the experimental design itself; for example, a survey study might have equivalent experimental control if responses are gathered by walking around a university campus asking for people to fill in paper sheets or if gathered through people responding on a website.

23.4.2.3

External validity

It is possible that, as web-based research is completely dependent on computers and networks, external validity may be reduced for research paradigms that historically have not been electronic (Reips, 2002a). However, a growing body of research suggests that web-based results are in line with lab-based results (Dandurand et al., 2008; Gosling et al., 2004; Kim et al., 2019; Krantz & Dalal, 2000) suggesting that this effect, if true, is relatively minor. If you are concerned with the external validity of a given experimental paradigm, then it is always an option to run a pilot experiment where you compare results obtained in a lab to those obtained via the web, as was done by Kim et al. (2019). While historically this testing has been between lab- and web-based implementations of an experiment, it would be possible to run a pilot using a three-alternative design, comparing (1) a lab-based instrument tested in the lab on one specific computer with (2) a web-based instrument tested in the lab on the same computer with (3) the same web-based instrument tested via the web and on participants’ home computers. This allows for more accurate identification of the cause of any issues, as a web-based instrument tested in the lab on the same hardware as the lab-based instrument should maintain most of the benefits of lab-based testing outlined above. This means that you can then determine which differences are solely due to differences between the experimental instruments before comparing differences between lab-based and web-based ­ ​­ approaches.

367

Jonathan D. Kim

23.4.2.4

Multiple submissions

This refers to the possibility of multiple submissions from the same participant, which would act to confound your results. Research has found that it is incredibly rare for this to happen (Reips, 2002a), but it is a serious enough risk that it should be accounted for. There are several methods to address this issue: collecting personal information, checking internal, date, and time consistency of answers, subsampling, participant pools, the provision of passwords, blocking duplicate IP addresses, and, more recently, the use of browser-based cookies that block participants from repeating the experiment without first clearing their cache (e.g., PsyToolkit). It should be noted that collecting personal information can be ethically problematic, as often personal information allows for potential identification of participants; this means that, if someone unauthorised accesses your data, there is a real possibility that anonymity would be stripped from your participants. Participant pools and password use still allow for at least some opportunity for potential identification, as memberships in these pools and personal communications containing these passwords could potentially be found as well. Finally, checking consistency of answers, subsampling, blocking duplicate IP addresses, and using cookies all do not require potentially identifiable information to be collected. With all this in mind, adopting one or more strategies from this final group will often, although not necessarily always, be the most ethically appropriate manner to mitigate multiple submissions.

23.4.2.5 

Participant ­drop-out ​­

As mentioned above, drop-out is always higher for web-based compared to lab-based research, as participants perceive themselves as having higher levels of control over the situation. Drop-out can be minimised in three manners (Reips, 2002a). The first is by providing immediate feedback to participants as a part of the experiment. This does not need to be ‘live’ feedback, with many ways to incorporate feedback able to be programmed in many instruments, such as providing information on how far through the experiment someone is, or how many items they got correct in a given task. The second is through providing financial incentives for participants to complete the study. The third is personalisation, allowing participants to tailor their experiences so that it better suits their needs. For example, if text is shown on screen during a survey task and doing so does not harm your experimental design, allowing participants to alter the size of the text so that those with poor eyesight can clearly read the displayed text.

23.4.2.6  Reduction in participant-researcher ­ ​­ interactions The reduction in participant-researcher interactions is a problem on multiple fronts. Firstly, regardless of the nature of the experiment, it prevents participants from receiving immediate feedback from the researcher if they do something wrong (Reips, 2002a). For experiments where a researcher is digitally present, there can be issues of the participant(s) feeling more disconnected from the researcher due to audio-visual lag and reduced non-verbal cues, which may lead to participants feeling less comfortable with discussing any concerns or questions they may have before, during, or after the experiment. For experiments where no researchers are present, there can be further issues. There is no opportunity for a reactive briefing or debriefing, with participants instead needing to contact the researchers with any queries, and there is no opportunity for giving immediate feedback (for example, when a participant is not following experimental instructions correctly). In relation to the first, one option might be to factor in further time than would be used

368

Testing in the lab and testing through the web

in laboratory settings for socialisation and answering questions relating to the briefing and debriefing. In relation to the second, one option is to have an easy-to-use contact option (e.g., a chat-bot with the option to reach live chat, a consistently monitored email box), while another is to attempt to mitigate the need for participants to contact you before, during, or after the experiment by pretesting your materials while also providing participants with the opportunity to give feedback on your participant instructions, ethical information, debriefing information, and any parts of the experimental setup and/or experimental interface that they might have found confusing, disturbing, and/or distracting, as well as what elements of each of these that they found helpful, positive, and/or ­ enjoyable.

23.4.2.7 ­Self-selection ​­ bias Self-selection bias refers to an issue that arises whenever participants who are too similar to each other self-select into the participant pool (e.g., Reips, 2002a). For example, if you are trying to conduct an experiment that examines the effect of dialect differences across a country but only get responses from participants who speak a single dialect. This can be an issue for lab-based studies as well. When addressing self-selection bias, there are several steps to take. Firstly, ensure that your advertisements are designed in such a way that they capture your target populations. Continuing with the dialect example, if the advertisement itself was written in a specific dialect, this may have disincentivised participants who speak a different dialect from responding. Secondly, use the multiple site entry technique (Reips, 2002a). This technique involves placing advertisements in many different sites (e.g., if recruiting digitally, in discussion groups, web sites, social media posts, and web forums) that have a high proportion of participants in your target demographic. For example, when looking at diverse populations within or between countries, city-specific buy-and-sell groups and student-focused groups might provide good avenues for these tailored advertisements, as many (though not all) members will be located in those cities. As an important part of this process, when advertising on the internet, it is important to place identifying information in the published URLs to your experiments and then analysing the different referrer information in the HTTP protocol can be used to identify these sources (Schmidt, 2000). The data can then later be examined in relation to differences in the demographic data, central results, data quality, and degree of appeal (measurable via drop-out, although requires collecting IP addresses, or asking early in the experiment where the participant saw the link to the experiment) as a function of site-of-origin. Following this, an estimate of the biasing potential through self-selection can be calculated (Reips, 2002a). If site-specific selfselection is not an issue, then the results from each site should be highly similar.

23.5

Conclusions

In conclusion, this chapter aimed to discuss the core experimental considerations for undertaking experimentation in both laboratory spaces and via the web, in order to provide the reader with an introduction to the nuances and the comparative benefits and drawbacks of lab- and web-based testing, so that for each experiment they undertake, they can identify where they must use a labbased or web-based approach and, when more flexibility is allowed, how to navigate and select the most relevant approach to fit their needs. To make reflection on which approach is best for your study easier, Table 23.3 presents the factors that might lead you to selecting each modality. As no two experiments are exactly the same, it is important to consider the best approach for each new experiment you wish to undertake; both in general to decide whether to use a lab- or web-based

369

Jonathan D. Kim ­Table  23.3 Factors guiding selection of a lab-based or web-based experimental approach ​­ ­Lab-based

Web-based ­ ​­

Experimental control more important than ecological validity Need to use more accurate and advanced equipment Need for experimental paradigm not useable in web-based ­ ​­ study Need for very high levels of precision

Ecological validity more important than experimental control Need or desire to examine motivational confounding/non-participation ­­ ​­ Need for higher levels of generalisability and voluntariness Desire to contribute to the openness of research processes Desire to contribute to the public control of ethical processes Limited resources available that would benefit from reduced experimental costs

Reduced ability to pre-test ­ ​­ Need for ability to reactively brief and debrief

approach, and in specific to design experiments that minimise negatives, and maximise positives, associated with the approach you decide upon (Table 23.3).

Further reading Birnbaum, M. H., (2000). Psychological Experiments on the Internet. Academic Press. de Groot, A. M. B., & Hagoort, P., (2017). Research methods in psycholinguistics and the neurobiology of language: a practical guide. Wiley & Sons, Inc.

Related topics New directions in statistical analysis for experimental linguistics; historical perspectives on the use of experimental methods in linguistics; contrasting online and offline measures: examples from experimental research on linguistic relativity; controlling social factors in experimental linguistics

References Baker, W., & Hüttner, J. (2016) English and more: A multisite study of roles and conceptualisations of language in English medium multilingual universities from Europe to Asia. Journal of Multilingual and Multicultural Development, 38(6), ­ ­501–516. ​­ Beauchamp, T. L., & Childress, J. F. (2009). Principles of biomedical ethics, 6th edition. Oxford University Press. Buchanan, T., & Smith, J. L. (1999). Using the internet for psychological research: Personality testing on the world wide web. British Journal of Psychology, 90, ­125–144. ​­ Calcagnotto, L. A., Huskey, R., & Kosicki, G. M. (2021). Accuracy and precision of measurement: Tools for ​­ validating reaction time stimuli. Computational Communication Research, 3, ­1–20. Chereni, S., Sliuzas, R. V., & Flacke, J. (2020). An extended briefing and debriefing technique to enhance ​­ ­ ­mixed-method ​­ data quality in ­cross-national/language research. International Journal of Social Research ­ ­661–675. ​­ Methodology, 23(6), Dandurand, F., Shultz, T. R., & Onishi, K. H. (2008). Comparing online and web methods in a problem­solving experiment. Behavior Research Methods, 40(2), ­ 428–434. ­ ​­ Dunn. P. K. (2022). 5.2 Precision and accuracy. In P. K. Dunn (Ed.), Scientific research and methodology: An in­ ­ ­­ ​­ troduction to quantitative research in science and health. https://bookdown.org/pkaldunn/SRM-Textbook Eichstaedt, J. (2001). An inaccurate-timing filter for reaction time measurement by JAVA applets implement​­ ­ ​­ ing ­internet-based experiments. Behavior Research Methods, Instruments, and Computers, 33, 179–186.

370

Testing in the lab and testing through the web Ekström, M. (2011). Do watching eyes affect charitable giving? Evidence from a field experiment. Experimental Economics, 15, ­530–546. ​­ Gosling, S. D., Vazire, S., Srivastava, S., & John, O. P. (2004). Should we trust web-based studies? A comparative analysis of six preconceptions about internet questionnaires. The American Psychologist, 59(2), ­ ­93–104. ​­ Green, M. (2007). Trust and social interaction on the internet. In A. Joinson, K. McKenna, T. Postmes, & U. -D. The Oxford handbook of internet psychology (pp. ­­  ­43–52). ​­ Oxford University Press. ​­ Reips (Eds.,), ­ Høiland-Jørgensen, T., Ahlgren, B., Hurtig, P., & Brunstrom, A. (2016). Measuring latency variation in the internet. CoNEXT ’16: Proceedings from the 12th international conference on emerging networking ex​­ periments and technologies, ­pp. ­473–480. Holbrook, A. L., Green, M. C., & Krosnick, J. A. (2003). Telephone versus face-to-face interviewing of national probability samples with long questionnaires. Comparisons of respondent satisficing and social desirability response bias. Public Opinion Quarterly, 67, ­79–125. ​­ Kidd, K., & Morgan, G. A. (2010). Experimental controls. In I. B. Weiner & W. E. Craighead (Eds.), The Corsini encyclopedia of psychology. Wiley Online Library. Kim, J. D., Gabriel, U., & Gygax, P. (2019). Testing the effectiveness of the internet-based instrument ​­ ­ ​­ ­­ ​­ PsyToolkit: A comparison between ­web-based (PsyToolkit) and ­lab-based (E-Prime 3.0) measurements of response choice and response time in a complex psycholinguistic task. PLoS One, 14(9), ­ e0221802. King, M. F., & Bruner, G. C. (2000). Social desirability bias: A neglected aspect of validity testing. Psychology & Marketing, 17(2), ­ ­79–103. ​­ Krantz, J. H., & Dalal, R. S. (2000). Validity of web-based psychological research. In M. H. Birnbaum (Ed.), Psychological experiments on the internet (pp. Academic Press. ­­  ­35–60). ​­ Lavrakas, P. J. (2008). Social desirability. In P. J. Lavrakas (Ed.), Encyclopedia of research methods (Vols. ­ ­1-0). ​­ Sage Publications, Inc. Levinson, S. C. (2016). Turn-taking in human communication – origins and implications for language processing. Trends in Cognitive Science, 20(1), ­ ­6–14. ​­ Musch, J., & Reips, U. -D. (2000). A brief history of web experimenting. In M. H. Birnbaum (Ed.), Psychological experiments on the internet. (pp. ­­  ­61–88). ​­ Academic Press. Nicholson, D. B., Parboteeah, D. V., Nicholson, J. A., & Valacich, J. S. (2005). Using distraction-conflict theory to measure the effects of distractions on individual task performance in a wireless mobile environment. Proceedings of the 38th annual Hawaii international conference on system sciences, USA, ­pp. ­1–9. ​­ O’Malley, C., Langton, S., Anderson, A., Doherty-Sneddon, G., & Bruce, V. (1996) Comparison of face-to­face and ­video-mediated ​­ ­ ­177–192. ​­ communication. Interacting with Computers, 8(2), Plant, R. R. (2009). Millisecond precision psychological research in a world of commodity computers: New ­ ­291–303. ​­ hardware, new problems? Behavior Research Methods, 41(3), Plant, R. R. (2016). a reminder on millisecond timing accuracy and potential replication failure in computer­ ­408–411. ​­ based psychology experiments: An open letter. Behavior Research Methods, 48(1), Reips, U.-D. (2000). The Web experiment method: Advantages, disadvantages, and solutions. In M. H. Birn­ ­­  ­89–114). ​­ baum (Ed.), Psychological experiments on the internet (pp. Academic Press. Reips, U.-D. ​­ (2002a). ­ Standards for ­Internet-based ​­ experimenting. Experimental Psychology, 49, ­243–256. ​­ Reips, U.-D. (2002b). Internet-based psychological experimenting: Five dos and five don’ts. Social Science Computer Review, 20(3), ­ ­241–249. ​­ Sanders, G. S., Baron, R. S., & Moore, D. L. (1978). Distraction and social comparison as mediators of social facilitation effects. Journal of Experimental Social Psychology, 14(3), ­ ­291–303. ​­ Schmidt, W. C. (2000). The server-side of psychology web experiments. In M. H. Birnbaum (Ed.), Psychological experiments on the internet (pp. Academic Press. ­­  ­285–310). ​­ Semmelmann, K., & Weigelt, S. (2018). online webcam-based eye tracking in cognitive science: A first look. Behavior Research Methods, 50, ­451–465. ​­ Seuren, L. M., Wherton, J., Greenhalgh, T., & Shaw, S. E. (2020). Whose turn is it anyway? Latency and the organization of ­turn-taking in ­video-mediated interaction. Journal of Pragmatics, 172, ­63–78. ​­ ​­ ​­ Smith, M., & Leigh, B. (1997). Virtual subjects: Using the internet as an alternative source of subjects and research environment. Behavior Research Methods, Instruments, and Computers, 29, ­496–505. ​­ Speed, L. J., Wnuk, E., & Majid, A. (2018). Studying psycholinguistics out of the lab. In A. M. B. de Groot & P. Hagoort (Eds.), Research methods in psycholinguistics and the neurobiology of language: A practical guide (pp. 190–207), John Wiley & Sons, Inc.

371

Jonathan D. Kim Stoet, G. (2010). Psytoolkit: A software package for programming psychological experiments using Linux. ­ ­1096–1104. ​­ Behaviour Research Methods, 42(4), Stoet, G. (2017). Psytoolkit: A novel web-based method for running online questionnaires and reaction-time ­ 24–31. ­ ​­ experiments. Teaching of Psychology, 44(1), Zhang, H., Seilhamer, M. F., & Cheung, Y. L. (2022). Identity construction on shop signs in Singapore’s Chinatown: A study of linguistic choices by Chinese Singaporeans and new Chinese immigrants. Interna­ 1–18. ­ ​­ tional Multilingual Research Journal, 17(1),

372

PART III

Focus on specific populations

24 EXPERIMENTAL METHODS TO STUDY CHILD LANGUAGE Titia Benders, Nan Xu Rattanasone, Rosalind Thornton, Iris Mulders and Loes Koring

24.1

Introduction

Many experimental studies, including those using paradigms described in this volume, require that participants sit through more trials and perform more complex tasks than are viable for children. However, if experiments are designed with the child participants in mind, children can participate in experiments and even enjoy them. This chapter presents considerations for designing experimental language studies that provide valid insights into children’s developing linguistic representations and processing. To focus on the chapter, we present two cases: (1) production experiments to probe children’s developing phonological and phonetic representations and processes, and (2) comprehension experiments to probe children’s developing syntactic and semantic competence. For both cases, we discuss how to choose between several viable paradigms and provide advice on designing stimuli and procedures. These two examples illustrate how researchers in some linguistic domains choose paradigms and design experiments, balancing their theoretical questions with children’s capabilities. This chapter concludes with a discussion of factors that are similar as well as different between the two domains discussed. Our hope is that readers will be able to extrapolate the provided considerations to their own studies, including those into other linguistic domains.

24.2

Phonological and phonetics production studies with children

Language production experiments to probe the phonological and phonetic representations and processes of young children are generally motivated by one of the following two broad research questions: 1 What are children’s developing representations of phonological elements and structures? 2 What is children’s developing productive phonetic (i.e., articulatory or acoustic) realisation of these elements and structures?

375

DOI: 10.4324/9781003392972-28

Titia Benders et al.

In asking these questions, researchers are typically interested in developmental trajectories or differences across populations, and they may assess the impact of a range of linguistic factors (e.g., segmental, prosodic, or word context). When are such questions best answered by conducting an experimental study? Discoveries can also be made, for example, by investigating spontaneous speech, already collected in corpora such as PhonBank (Rose & MacWhinney, 2014), the phonology section of CHILDES (MacWhinney, 2000). One reason to conduct experimental studies is if children do not spontaneously produce enough tokens to answer a research question. This may occur if the goal is to investigate emerging abilities or a rarely produced target that is of theoretical interest. Experimental studies also solve several other challenges in the analysis of spontaneous child speech. First, by eliciting specific targets, the researcher avoids ambiguity about what the target is. Second, by eliciting targets in selected contexts, the researcher ensures that phonetic analyses are not compromised by the variable linguistic contexts in which spontaneously produced targets occur (e.g., a given target word may be produced with and without focus, utterance-medially and utterance-finally, and in a range of segmental contexts). Finally, by making acoustic (or articulatory) recordings appropriate to the research question, the researcher ensures that acoustic-phonetic analyses are not compromised by reverberation or noise (e.g., of toys or caregivers’ speaking), which can often occur in recordings of spontaneous speech. This section discusses how to answer phonological and phonetic research questions using production experiments, that is, by eliciting specific target utterances from children by asking them to imitate a model (elicited imitation) or produce the target in a constraining context that invites the target as the most likely response (elicited production). Rather than replicating detailed instructions on elicited imitation (Lust & Blume, 2016) or elicited production (Lust & Blume, 2016; Thornton, 1996), we will discuss the choice between elicited imitation and production as well as design features of the stimuli with a focus on phonology and phonetics.

24.2.1

Elicited imitation, elicited production, and choosing between them

Once the decision has been made to run an experimental study, the next choice is between two available paradigms: elicited imitation (or: repetition) and elicited production. The rationale of both is that the child’s output provides a window into children’s phonological and phonetic system. For example, when the child produces a target that deviates from the adult form, this indicates non-adult-like representations or processing (including planning and articulation). However, if we analyse both tasks according to the stages of speech production as defined by Levelt (1989), there are also differences. The choice between the paradigms can be motivated by referring to these task differences as relevant to the research question, or based on the age of the participants (as discussed in this section), and the constraints on the stimulus design (as discussed in the next section). Elicited imitation tasks present a spoken or signed (depending on the language modality) stimulus and the child is asked to repeat it. Elicited imitation tasks provide the concepts and words, presuming that the child participant still needs to (phonologically) encode the words, generate a phonetic plan, and articulate. This suggests that elicited imitation taps into children’s phonological and phonetic abilities. And indeed, children produce target forms that are not adult-like with patterns that converge with those observed in elicited production and spontaneous speech. For example, English-learning children are more likely to omit coda consonants after long than short vowels in spontaneous speech, elicited production, and elicited imitation (Demuth et al., 2006; Kehoe & Stoel-Gammon, 2001; Miles et al., 2016). 376

Experimental methods to study child language

Elicited production tasks crucially do not model the target but elicit it, for example, using a picture that needs to be named or an utterance that needs to be completed. This procedure requires children to go through all stages of speech production: conceptualisation of the message (in response to the prompt), access to the word representations, grammatical and phonological encoding, generation of a phonetic plan, and articulation. The child’s output thus provides a window into the entire speech production process. Whether to use elicited imitation or production for a given study must be decided based on the research question and children’s stage of language acquisition. One often-asked question is whether elicited imitation, which provides children with a model of the word form, can indeed tap into children’s own phonological and phonetic system. Readers familiar with guidelines for using imitation to study syntactic development may assert that this method is only valid if stimulus utterances are too long for children to hold in working memory and repeat word-by-word, thus requiring a re-construction through children’s own syntactic system (Eisenbeiss, 2010; Lust et al., 1996). Similarly, if a study aims to reveal children’s difficulties at, for example, the phonologymorphology interface, the task of repeating short utterances can become too simple for children over three years of age to reveal effects (Xu Rattanasone & Demuth, 2022). Some older research into children’s phonological production similarly suggests that children display more advanced phonological production on elicited imitation compared to elicited production tasks (e.g., Johnson & Somers, 1978). In particular, children’s production of forms that they almost master may be over-estimated based on their performance in imitation tasks (Goldstein et al., 2004). While elicited imitation may thus be suitable for tapping into the differences between absent, emerging, and more established forms, elicited production may be needed for making fine-grained distinctions between partially and fully mastered forms. Elicited production is also a must for answering questions about productivity. A productive phonological process in Mandarin, for example, is tone sandhi: when two dipping tones (Tone 3) follow each other, the first changes to a rising tone (Tone 2). By three years of age, Mandarinspeaking children not only produce tone sandhi in known words, e.g., producing /yu3 san3/ ‘umbrella’ as [yu2 san3] (Rattanasone et al., 2018) but also productively apply tone sandhi to novel disyllabic words, e.g., producing the novel item /ma3 ku3/ ‘horse drum’ as [ma2 ku3] (Tang et al., 2019). As questions around productivity can only be answered by asking children to generate and encode items they have never heard before, elicited imitation is simply not an option. Many other phonological and phonetic issues, however, have been successfully addressed using elicited imitation of individual words. Elicited imitation of non-words has been used extensively to assess the impact of phonological factors on children’s speech production (for a review: Coady & Evans, 2008), finding, for example, that children are more likely to produce coda /d/ when repeating high­ ​­ (rather ­ than ­low-)​­ phonotactic probability ­non-words ​­ (Zamuner ­ et  al., 2004). Elicited imitation of both real words and non-words has also been used to investigate children’s phonetic realisations, finding, for example, that children display non-adult-like articulations of coda /l/ (Lin & Demuth, 2015) and a higher degree of coarticulation than adults (Noiray et al., 2018). These examples illustrate that even a one-word elicited imitation task does not (entirely) bypass the child’s phonological and phonetic system. Finally, elicited imitation is also often the better choice for studies with participants under the age of three: it is ‘readily accessible and natural to children even as young as 1 to 2 years of age’ (Lust et al., 1996). While elicited production is considered viable, with some effort, with children from 2.5 years old (Thornton, 1996), children under the age of 3 are, in our experience, mostly successful at simple one-word picture-naming tasks. Elicited imitation is thus essential for studying 377

Titia Benders et al.

phonological and phonetic processes in the production of longer utterances by children under three years of age.

24.2.2

Designing stimuli for speech production tasks

The next step towards running an experimental study into young children’s phonology or phonetics is designing the stimuli. Here we outline some of the likely challenges that arise from imposing the phonological requirements also applicable to adult work in phonology and phonetics (see, e.g., Chapter 2 of this volume) on stimuli that must be compatible with children’s relatively small vocabularies (see also Edwards & Beckman, 2008). While we offer a range of potential solutions to these challenges, a researcher should anticipate various iterations of their stimulus set and always vet stimuli with children from the target age. Most studies will require target words that meet phonological specifications, often making it difficult to find real target words. For example, to study the production of /g/ by Dutch-German bilingual children, Stoehr et al. (2022) needed target words with /g/ and /k/ in the onset (to answer the research question) that were monosyllabic or disyllabic with stress on the first syllable (to control for the effect of stress on stop production). Further requirements can include the absence of late-acquired segments or phonotactic patterns elsewhere in the word, to avoid developmental constraints affecting the children’s productions (Smit, 1993). A non-phonological requirement for elicited production with pictures is that the label needs to have a picturable referent. In children’s relatively small vocabularies, only few words tend to meet all these constraints. It can thus be challenging to find enough items for making reliable and generalisable observations, or to balance items across conditions on their phonological properties, frequency, or age of acquisition. For example, Stoehr et al. (2022) could only find one suitable target with a /g/ onset in Dutch, where /g/ is a loan segment and occurs in relatively few words. Children not knowing sufficient target words, or not knowing them well, can make a regular elicited production procedure impossible. One solution to a lack of sufficiently familiar target words may be to familiarise participants with the target words prior to the production task. For example, caregivers can be sent a list of pictures and associated labels to read with their child, or the researcher can play a picture-naming game with the child just before the experiment (e.g., Tang et al., 2019). If sufficient target words simply don’t exist, the researcher may need to resort to using novel words. This was Stoehr et al.’s solution for the limited number of words with a /g/ onset in Dutch. They started their session by telling participants a story about three characters – Gabi, Gero, and Gizmo, and then elicited these names together with the familiar target words during the subsequent elicited production task. However, children’s memory poses a limit to the number of novel words that can be introduced and subsequently elicited. A final solution may thus be to adopt an elicited imitation task, as this substantially relaxes the constraints on both picturability and familiarity of target words. If a researcher chooses to use novel words, especially in combination with elicited imitation, several additional constraints should be considered. Elicited imitation of novel words may be impacted by the frequency of the segments and phonotactic patterns in the word (Coady & Aslin, 2004; Edwards et al., 2004; Zamuner et al., 2004). If these factors are not of interest, the novel words must be designed to have high-frequency segments and phonotactics. A perceptual process that may occur in novel word imitation is that children assimilate the novel word to a similarsounding familiar word (Swingley, 2016) and produce the latter. This can be (somewhat) avoided by selecting novel target words that do not have (high-frequency) neighbours in a child’s lexicon.

378

Experimental methods to study child language

However, this constraint must be balanced against the need for novel words with high-frequency segments and phonotactics. Once target words have been selected, the research question may require that the words are embedded in specific phrases. For example, to investigate the effect of footedness on article omissions, nouns with articles need to be elicited following monosyllabic and disyllabic verbs (contrasting footed and unfooted articles, e.g., ‘he kicks the pig’ versus ‘he catches the pig’) (Gerken, 1996). Phrases may also be necessary to facilitate reliable phonetic measurement. For example, the (acoustic-phonetic) closure duration of word-onset stops can only be measured if the target word is embedded in a phrase, such as ‘See this [target noun]’ or ‘these mice [target verb]’ (Millasseau et al., 2021). In the design of such phrases, care should be taken that target utterances do not become too long for children to re-plan using their own linguistic system. On the other hand, if the goal is to study whole-word omissions, as in the example on article omissions, the utterances also should not be so short that children can hold an entire utterance in working memory and rote-imitate it word-by-word (Eisenbeiss, 2010; Lust et al., 1996). Selecting the appropriate stimulus length may start with going slightly beyond the approximate mean length of utterance (MLU) of children in the target age (Eisenbeiss, 2010) but will ultimately be decided based on pilot studies with children from the target population (Lust et al., 1996).

24.3

Syntax and semantics: Investigating child grammar

Researchers interested in children’s syntactic and semantic competence ask the following kinds of questions, generally: 1 What sentences are acceptable in child grammar: a subset of adult acceptable sentences, a superset, or the same set? 2 What interpretation(s) do children assign to sentences they hear: a subset of adult interpretations, a superset, or the same set? 3 How do children arrive at these meaning representations? Answering these questions requires detailed insight in children’s sentence comprehension, as provided in experimental studies. While spontaneous and elicited productions reveal which sentence strings are acceptable to the developing child, they do not provide information about which sentences the child considers unacceptable. Results from production studies can thus only partly answer question 1. Moreover, they do not reveal what sentences mean to the child (question 2) or how the child’s meaning representations came about (question 3). Researchers interested in children’s syntactic and semantic competence are, therefore, bound to assessing children’s comprehension using experimental studies. For instance, a researcher interested in role assignment in passive sentences cannot rely on the child producing sentences like, ‘The girl is being hugged by the boy.’ They must design an experiment to figure out whether and how the child knows who’s doing the hugging. This section discusses two of the types of experiments one could conduct to gain insight in children’s syntactic and semantic competence, namely the Truth Value Judgment Task (TVJT) and the Visual World Paradigm (VWP). After introducing both paradigms and the insights they provide, we will discuss how to choose between these paradigms. This is followed by considerations for designing stimuli and procedures for each paradigm.

379

Titia Benders et al.

24.3.1

Eliciting truth value judgments, tracking eye movements, and choosing between them

Even though a variety of acceptability judgment tasks can be used with adult participants to probe question 1 (see Chapters 7 and 8 in this volume), it has been proven much more difficult to obtain reliable acceptability judgments from children under the age of five or six (e.g., Thornton, 2021). Children under this age do not seem to possess the necessary meta-linguistic awareness to reliably make acceptability judgments. More robust data can be obtained regarding children’s meaning representations (question 2), by examining children’s judgments of the truth or falsity of sentences. While a variety of methods are available to probe children’s interpretation of sentences, including act-out tasks, picture­ ­ ­​­­ ­​­­ ​­ ­ Ambridge  & Rowland, 2013 for an overview), the gold selection tasks, and do-as-I-say tasks (see standard for investigating children’s interpretations remains the TVJT. The TVJT was invented by Crain and McKee (1985) and presented in detail including refinements by Crain and Thornton (1998). Despite technological advances that have increased the number of available experimental methodologies, it is to this day one of the most reliable sources of data on syntactic and semantic acquisition. What a TVJT does not reveal, however, is how the child arrived at their final meaning representation (question 3). That is, a TVJT cannot provide insight into what information informed sentence processing, at which point, and whether alternatives were considered. To answer such questions, a researcher may turn to an online method such as the eye-tracking VWP, which provides information about the emerging syntactic and semantic representation during sentence processing. Visual World experiments have become increasingly popular since the work of Tanenhaus and colleagues in the nineties (Tanenhaus et al., 1995). We will first discuss both paradigms in more detail and then turn to choosing between these tasks for a specific research question.

24.3.1.1

The truth value judgment task (TVJT)

The TVJT tests children’s comprehension of sentences by assessing the truth values children assign to those sentences. The TVJT excels at probing whether children rule out interpretations such as those ruled out by grammatical constraints. Suppose a researcher wants to know whether children accept a sentence like ‘Every mermaid covered her with sand’ on the interpretation where each mermaid covers herself with sand, in violation of a grammatical constraint on the interpretation of pronouns (e.g., her) (e.g., Thornton & Wexler, 1999). Alternatively, do children reject that constraint-violating interpretation and get only the adult interpretation on which every mermaid covered some other female (say, Snow White) with sand? To distinguish the two interpretations, the interpretation that violates the constraint is made true in the story shown to children, while the adult interpretation is made false: Test Sentence: ‘Every mermaid covered her with sand’ Interpretation A, True in the story: Every mermaid covers herself with sand Interpretation B, False in the story: Every mermaid covers Snow White with sand These interpretations are woven into a story acted out to children with toys and props. The actedout story is watched by the child participant together with a puppet (e.g., Kermit the Frog) who 380

Experimental methods to study child language

delivers the test sentence. The child’s task is to tell the puppet if his statement describes what happened in the story. A hypothetical story might start with a group of four mermaids and Snow White going to the beach for a picnic. After lunch they all play on the beach and Snow White asks all the mermaids to cover her with sand. All mermaids agree at first. Then one mermaid changes her mind because she is hot and it is cool under the sand, so she just covers herself. The second and third mermaids also apologise to Snow White and explain they are getting too hot and need to cool down by covering up with sand. Finally, the fourth and last mermaid feels sorry for Snow White and says she can help. This fourth mermaid covers Snow White up to the neck with sand and then covers herself with sand to get cool. Following the story, Kermit the Frog tries to recount the events of the story by saying ‘In that story there was Snow White with some mermaids. And I know what happened. Every mermaid covered her with sand.’ A child who gets the constraint-violating interpretation A, that every mermaid covered herself with sand, will tell Kermit that he was right, as every mermaid did indeed cover herself with sand in the story. A child who has the adult grammar will tell Kermit he is wrong. In this case, the child is asked to explain to Kermit what really happened. If the child explains that only one mermaid covered Snow White with sand, this informs the researcher that the child rejected the test sentence for the right reason. If the child judges the puppet’s statement to be truthful, the sentence-meaning pairing presented is part of the child’s grammar; if the child judges the puppet’s statement as false, the sentencemeaning pairing is not available. Here it is important to point out that children, like adults, tend to be generous to their interlocutors and will always attempt to select a meaning for the sentence that makes the puppet’s statement true. If a sentence is ambiguous to the child, i.e., compatible with more than one meaning representation, the child is expected to accept the sentence on both (or all) interpretations. Suppose in our example that the child can get both interpretations A and B for the test sentence. A TRUE answer reveals that the child can generate the constraint-violating interpretation A, but does not exclude the possibility that the adult interpretation might also be available1. If children reject Kermit’s statement, thus giving a FALSE answer, the researcher can be confident that the meaning as presented in the story is not a meaning the child can generate for the test sentence. Rejections thus present the strongest evidence for the boundaries of children’s grammar, providing insight into question 2. The strong evidence provided by rejections relates to a crucial design feature of the TVJT: the condition of plausible dissent. The story should provide the child with a reason to reject the test sentence, by incorporating both a possible and an actual outcome of the story’s events. The possible outcome corresponds to the statement presented by the test sentence (in our example, every mermaid covering Snow White with sand), and this outcome should seem likely at some point in the story (in our example, all mermaids initially agreed to cover Snow White with sand). The story unfolds otherwise, however, such that the actual outcome is ultimately different (every mermaid covered herself with sand). Children from age three can successfully participate in such tasks (e.g., Thornton, 2021). They are typically engaged in the stories acted out in front of them and enjoy their task of correcting a silly puppet. This makes the role of the puppet an important one: children are more inclined to correct a puppet than an experimenter (especially if the puppet is staged as inattentive or a bit silly), and the puppet keeps the child engaged and focused on the stories. Such factors typically result in low drop-out rates, although these may increase for more complex structures. Another advantage of this task is that, if set up correctly, the researcher obtains a very solid and reliable data set that does not only show which meaning representations for a particular sentence can be accessed, but crucially, which meanings cannot be accessed. 381

Titia Benders et al.

24.3.1.2

The visual world paradigm

While a TVJT provides reliable information about the eventual meaning representation(s) children assign to sentences, it does not provide information on how this meaning representation came about, or its time course (question 3). A technique that can provide such information is monitoring participants’ eye movements on a visual display while they are listening to sentences (VWP). In a VW experiment, participants can either simply watch a visual display while listening to sentences (a passive viewing task), or they may perform an additional task, for instance, selecting a target image corresponding to a spoken word. It has been shown that (adult) participants automatically look at pictures that are semantically related to a spoken word (starting with Cooper, 1974). Yee and Sedivy (2006) had participants watch a display with four objects: a target, a competitor, and two unrelated objects. For example, the target may have been logs and the competitor a key. The latter is semantically related to lock, which was not displayed, but is a phonological competitor of the target logs. Upon hearing logs, participants look more to the competitor key than unrelated images, without explicitly being instructed to do so. That is, eye movements are indirectly modulated by (spoken) language. This holds for adults, but also for children from a young age (Trueswell, 2008). This means that we can infer from participants’ eye movements how attention shifts during sentence processing. Therefore, information about when the eyes move towards (a part of) a picture can provide information about the unfolding meaning representation. This is, however, by no means a direct relation. A major benefit of the VWP is that it presents a natural task which is not demanding for the participant. All the participant is asked to do in a passive viewing task is watch a visual display and listen to sentences. It is, therefore, a technique that is particularly suitable for testing young ​­ rates children. Even infants can be tested using this technique (e.g., Fernald et al., 2008).2 ­Drop-out are typically higher than with a TVJT, however, between 15 and 30%. Reasons for this include practicalities around eye-tracking: not everyone’s eyes are equally easy to track, and it may be difficult to ensure children sit still enough to accurately track their eyes. Another factor may be the absence of a puppet to keep the child engaged, which may mean children get distracted more easily in a VW experiment, potentially leading to data loss if they look away from the screen.

24.3.1.3

When to choose which one?

A few basic factors might help decide whether to choose the TVJT or VWP, including the target age group and available equipment. While a TVJT can only be successfully administered from about age three, viewing tasks have been administered successfully to study the syntax-semantics interface even with 25-month-olds (e.g., Naigles, 1990). Another consideration is the accessibility of equipment. Acting out stories for a TVJT only requires some toys, whereas high-end eyetrackers with good temporal resolution and good support typically cost > €20,000.3 But suppose you want to test four- and five-year-olds and have access to an eye-tracker: which method should you opt for? This crucially depends on the nature of the research question. Even if it might be tempting to use a ‘newer ’ method such as eye-tracking, this method provides crucially different information than a TVJT, and, given its indirect nature, results will be more difficult to interpret (yet setup and data analysis are much more time-consuming). A TVJT provides a direct measure of children’s resulting meaning representations, something eye movement data by themselves cannot provide.4 Crucially, a TVJT can offer a complete overview of the truth conditions children assign to sentences, as it does not only show under which conditions children accept a 382

Experimental methods to study child language

sentence, but also under which conditions children judge a sentence to be false. A VW experiment typically does not provide information about what meaning representations children reject; we can at most infer a preference for one representation over another from one picture attracting more looks than another. At the same time, eye movements provide information about how children build their meaning representations, something a TVJT cannot offer. This includes information about structural attachment and reference decisions made along the way, as well as what information informs these decisions and when they are made. Eye-tracking has, therefore, already successfully been used to study resolution of syntactic and semantic ambiguities (e.g., Lohiniva & Panizza, 2016; Trueswell et al., 1999), reference resolution (e.g., Arnold et al., 2007; Runner & Head, 2014), verbal selectional restrictions (Mani & Huettig, 2012), and filler-gap dependencies (e.g., Atkinson et al., 2018; Koring et al., 2018).

24.3.2

Designing stimuli and procedures for a TVJT

A researcher who has decided to run a TVJT needs to design stimuli (stories) and possibly adapt the procedure. As discussed earlier, a first critical feature of the stories is that they must meet the criterion of plausible dissent to elicit rejections, or FALSE responses, from children. In addition, to balance TRUE/FALSE responses in the experiment, filler items that are unrelated to the target sentences should be included in which the puppet’s statement is unambiguously a TRUE statement. An important feature of stimulus presentation in the TVJT is acting out the stories with actual toys and props. This allows the experimenter to show rather than describe every event. It is important to ensure that what is acted out is a close visual record of the events in the stories, particularly with pre-schoolers. Children are much less likely than adults to accept partially acted-out events as representations of the full event. This design feature is especially relevant considering the move towards presenting stories in animations or pictures, for example, for online testing. The researcher needs to ensure that children accept the animations or pictures as representations of the intended events. A major benefit of acting out events (or using animations) is that the characters can do the talking instead of the experimenter narrating the story. Events acted-out in real time have proven easy for children to remember. Furthermore, an acted-out rather than narrated story avoids the need to use the sentence structure or interpretation under investigation, and so avoids modelling sentences or priming the child for a particular structure. In our mermaid example, showing each mermaid covering herself with sand establishes Interpretation A and avoids the experimenter having to say, ‘every mermaid covered herself with sand.’ Similarly, showing that the fourth mermaid covered Snow White with sand avoids using the sentence ‘this mermaid covered her/Snow White with sand.’ A key feature of stimuli for the TVJT, or any experimental task into children’s syntactic and semantic competence, is that felicity conditions related to the use of the test sentences are met. As an example, restrictive relative clauses (e.g., the girl who is sitting under the tree) are only felicitous if there is a set (in our case of girls) to restrict from. Failure to meet this condition in the experimental setup has been shown to induce errors in children’s responses (e.g., Hamburger & Crain, 1982). Children are much less able to accommodate infelicity than adults are, and so failure to meet felicity conditions may mask children’s competence (e.g., Hamburger & Crain, 1982; Meroni & Crain, 2011; Thornton, 2017). The requirement to meet felicity conditions may not only impact on the stories and stimuli but also on procedural details of the TVJT. Specifically, the original TVJT has been modified to present a felicitous context when the test sentences include words that indicate some level of 383

Titia Benders et al.

uncertainty. As an example, let us imagine that we are interested in the relative scope of logical ­ ­ operators such as ‘somebody’ and ‘and’ in (2). 2 Somebody brought cake and lemonade to the party. A word like somebody is not typically used when we know who brought cake and lemonade to the party. Instead, we use somebody to indicate that there is uncertainty about the identity of this person. It is, therefore, infelicitous to use somebody as an after-the-fact description of a story in which we know who brought cake and lemonade. One way to solve this issue is to use the prediction mode rather than the description mode of the TVJT (cf. Boster & Crain, 1993).5 In the prediction mode of the TVJT, a story only partially develops and is then paused, at which point Kermit makes a prediction about what is going to happen. If we are interested in the scope of the existential quantifier, we could for instance set up a story about party preparations, and have Kermit produce sentence (3). 3 I don’t know what everyone will bring to the party, but I’m going to guess that somebody will bring cake and lemonade. After Kermit’s prediction, the story unfolds and may reveal that two different individuals bring cake and lemonade, respectively. This does not correspond to the preferred adult interpretation of sentence (3) in which one and the same individual brings both cake and lemonade (where and takes narrow scope). Adults would, therefore, presumably judge Kermit’s predication to be incorrect. The question is whether children reject Kermit’s prediction as well or whether they might permit and to take wide scope, resulting in an interpretation as presented in the story. In the prediction mode of the TVJT, Kermit repeats his guess after the story has finished to make sure the child remembers his guess, and asks the child whether he was right, as in (4). 4 I said that somebody would bring cake and lemonade to the party. Was I right? The prediction mode has successfully been used in a variety of studies measuring children’s interpretation of words that are used in contexts of uncertainty such as or (e.g., Boster & Crain, 1993; Chierchia et al., 2001).

24.3.3

Designing stimuli and procedures for a visual world experiment

A researcher who has decided to run a visual world experiment faces a laborious task, as they need to control for properties of the (words in the) input sentences, (visual) properties of the visual display, as well as the relation between sentence and display. General recommendations for experiments with adults apply to experiments with children as well and have been described extensively elsewhere (e.g., Carter & Luke, 2020; Holmqvist et al., 2011; chapters 23 and 24 of this volume). The focus here is on design features that are particularly important when designing experiments ​­ for ­pre-schoolers. As indicated earlier, any experimental task into children’s syntactic and semantic competence should meet the felicity conditions. These apply not only to the sentences and displays presented but also the task itself. For instance, passive viewing seems to give rise to more variable data with pre-schoolers (and older children) (Sedivy 2010; Trueswell 2008). More robust data can be obtained by including a task the participant has to perform in relation to the visual display (e.g., 384

Experimental methods to study child language

identifying an object, or perhaps a [Truth Value] judgment of the spoken sentence given the visual display) (Sedivy, 2010; Trueswell, 2008). A difference between adults and children that specifically affects eye-tracking tasks is in terms of disengaging attention from a visual object. Like adults, children use various sources of information to incrementally build a meaning representation, which drives their eye movements (e.g., Omaki & Lidz, 2015). Unlike adults, however, children seem to disengage much less readily once they are engaged in looking at an object (Huang & Snedeker 2011; Koring, 2013; Ross et al., 1994). As an example, Huang and Snedeker showed that when five-year-olds are presented with the spoken word logs, they will, like adults, move their eyes to a semantic associate (e.g., key) of a phonological competitor of logs (e.g., lock) more than to unrelated distractor images. Different ­ from adults, children keep looking at the key well after the spoken word has been disambiguated (Huang & Snedeker, 2011). If disengagement is a crucial aspect of the task, for instance, when a researcher is interested in measuring reactivation, a solution could be to explicitly disengage children’s attention to a visual object. This can be achieved by briefly presenting a fixation point during the test sentence – i.e., a point typically presented in the middle of the screen that participants are instructed to look at whenever it appears. This triggers a saccade away from what the participant was looking at (Koring et al., 2018).

24.4

General recommendations for practice

This chapter set out to present considerations for designing experimental studies that provide valid insights into children’s developing linguistic representations and processing. This was done by discussing (1) production methods to study children’s developing phonological and phonetic representations and processes and (2) comprehension methods to study children’s syntactic and semantic competence. The focus was on procedures that can be used with children between two and five years of age. Across these divergent types of experiments, several common trends can be identified. First, the choice of the experimental method will often be informed by the research question combined with the age of the children a researcher asks this question about. This implies that some research questions currently cannot be answered for children of all ages: while elicited production (rather than imitation) is necessary to investigate productive phonological processes and the TVJT is the most reliable method to probe the truth conditions children assign to sentences they hear, both procedures can only be meaningfully conducted with children from three years and upwards. Ongoing methodological developments are thus needed to map the early stages of these developing linguistic abilities. Second, the theoretically most meaningful data do not automatically come from the most complex method. In the domain of phonology and phonetics, we observed that elicited imitation tasks can be designed with fewer constraints on the stimuli and offers a procedure that is achievable for younger children. Yet, the data from an elicited imitation task can be just as informative as elicited production for answering many (although not all) research questions. In the realm of syntax and semantics, we contrasted the TVJT, which can be conducted with a few toys and no technical equipment, and the VWP, which requires carefully controlled audio and visual stimuli as well as an eye-tracker (although a two-picture version that doesn’t require millisecond precision could be conducted with just a video camera followed by careful manual coding). The VWP provides millisecond-to-millisecond information on how language guides the child’s visual attention. However, this data richness is less straightforward to interpret and does not provide information about what meaning representations are not available to children. Thus, the developmental researcher in 385

Titia Benders et al.

any domain is advised to consider whether the more complex experimental procedure available is indeed necessary to answer their research question. Third, children’s language abilities (understood broadly) need to be considered in every aspect of the experiment preparation. The exact considerations are partly determined by the object of investigation. When studying children’s phonology and phonetics, stimuli need to be very carefully controlled at every phonological level. However, there is no need to provide children with an extensive justification as to why they would repeat utterances. Tapping into children’s syntactic and semantic competence, on the other hand, requires a task that is carefully embedded in the context. In this case, it is less relevant whether the lexical items consist of early-acquired segments. It is our hope that the considerations laid out in this chapter will help researchers identify the aspects of experiment design that are less and more important for making their own discoveries about child language. On a final note, while we have not provided advice on working with young children in a research setting, the success of any experiment with children relies on taking the time to build good rapport with children, simple but clear instructions, and encouragement throughout. We encourage prospective child researchers to take ample time to learn these and other skills in a hands-on fashion.

Notes 1 Moreover, whenever we are confused in normal conversation, or didn’t understand our interlocutor well, we tend to agree with our interlocutor, rather than disagree. This is another factor that makes the child’s rejection of a test sentence a stronger piece of evidence (Crain & Thornton, 1998). 2 The Preferential Looking Paradigm is in its essence the same as the Visual World Paradigm (Omaki & Lidz, 2015). 3 Measuring where children look at which time point does not require the use of an eye-tracker, one can also simply use a camera to record a child’s gaze (e.g., Snedeker & Huang, 2015; Trueswell, 2008). But when high temporal and/or spatial resolution are required, an eye-tracker is required. 4 Of course, a viewing task can be combined with some measure of the resulting meaning interpretation, as in (Arnold et al., 2007). 5 Though not all inferences survive in prediction mode, which is something the researcher should consider.

Further reading Blom, E., & Unsworth, S. (2010). Experimental methods in language acquisition research. John Benjamins Publishing Company. Sekerina, I. A., Fernández, E. M., & Clahsen, H. (Eds.). (2008) Developmental psycholinguistics: On-line ­ ​­ methods in children’s language processing. John Benjamins Publishing Company.

Related topics Experimental phonetics and phonology, experimental syntax, experimental semantics, contrasting online and offline measures in experimental linguistics, analysing the time course of language comprehension, analysing spoken language comprehension with eye-tracking

References Ambridge, B., & Rowland, C. F. (2013). Experimental methods in studying child language acquisition. Wiley ­ 149–168. ­ ​­ Interdisciplinary Reviews: Cognitive Science, 4(2), Arnold, J. E., Brown-Schmidt, S., & Trueswell, J. (2007). Children’s use of gender and order-of-mention dur­ ­527–565. ​­ ing pronoun comprehension. Language and Cognitive Processes, 22(4),

386

Experimental methods to study child language Atkinson, E., Wagers, M. W., Lidz, J., Phillips, C., & Omaki, A. (2018). Developing incrementality in fillergap dependency processing. Cognition, 179, ­132–149. ​­ Boster, C. T., & Crain, S. (1993). On children’s understanding of every and or. In Conference proceedings of early cognition and the transition to language. Center for Cognitive Science, University of Texas, April ­23–25. ​­ Austin, TX. Carter, B. T., & Luke, S. G. (2020). Best practices in eye tracking research. International Journal of Psychophysiology, 155, ­49–62. ​­ Chierchia, G., Crain, S., Guasti, M. T., Gualmini, A., & Meroni, L. (2001). The acquisition of disjunction: Evidence for a grammatical view of scalar implicatures. In A. H.-J. Do, L. Domínguez & A. Johansen (Eds.), Proceedings of the 25th Boston University conference on child language development, Cascadilla ­ Press, Somerville, MA, ­pp. ­157–168. ​­ Coady, J. A., & Aslin, R. N. (2004). Young children’s sensitivity to probabilistic phonotactics in the developing lexicon. Journal of Experimental Child Psychology, 89(3), ­ ­183–213. ​­ Coady, J. A., & Evans, J. L. (2008). Uses and interpretations of non-word repetition tasks in children with and without specific language impairments (SLI). International Journal of Language and Communication Disorders, 43(1), ­ ­1–40. ​­ Cooper, R. M. (1974). The control of eye fixation by the meaning of spoken language. Cognitive Psychology, 6, ­84–107. ​­ Crain, S., & McKee, C. (1985). Acquisition of structural restrictions on anaphora. In J. M. Berman & J.-W. Choe (Eds.), Proceedings of the 16th North-East Linguistic Society, GLSA, Amherst, MA, ­94–110. ­ ​­ Crain, S., & Thornton, R. (1998). Investigations in universal grammar: A guide to experiments on the acquisition of syntax and semantics. MIT Press. Demuth, K., Culbertson, J., & Alter, J. (2006). Word-minimality, epenthesis and coda licensing in the early acquisition of English. Language and Speech, 49(2), ­ ­137–174. ​­ Edwards, J., & Beckman, M. E. (2008). Methodological questions in studying consonant acquisition. Clinical Linguistics and Phonetics, 22(12), ­ ­937–956. ​­ Edwards, J., Beckman, M. E., & Munson, B. (2004). The interaction between vocabulary size and phonotactic probability effects on children’s production accuracy and fluency in nonword repetition. Journal of Speech, Language, and Hearing Research, 47(2), ­ 421–436. ­ ​­ Eisenbeiss, S. (2010). Production methods in language acquisition research. In E. Blom & S. Unsworth (Eds.), Experimental methods in language acquisition research (pp. ­­  ­11–34). ​­ John Benjamins Publishing ­ Company. Fernald, A., Zangl, R., Portillo, A. L., & Marchman, V. A. (2008). Looking while listening: Using eye movements to monitor spoken language comprehension by infants and young children. In I. Sekerina, E. Fernandez & H. Clahsen (Eds.), Developmental psycholinguistics: On-line methods in children’s language processing (pp. 97–135). John Benjamins Publishing Company. Gerken, L. (1996). Prosodic structure in young children’s language production. Language, 72(4), ­ 683–712. ­ ​­ Goldstein, B., Fabiano, L., & Iglesias, A. (2004). Spontaneous and imitated productions in Spanish-speaking ­ 5–15. ­ ​­ children with phonological disorders. Language, Speech, and Hearing Services in Schools, 35(1), Hamburger, H., & Crain, S. (1982). Relative acquisition. In S. Kuczaj (Ed.), Language development 1 (pp. ­­  ­245–274). ​­ Lawrence Erlbaum Associates. Holmqvist, K., Nyström, M., Andersson, R., Dewhurst, R., Jarodzka, H., & de Weijer, J. (2011). Eye tracking: A comprehensive guide to methods and measures. OUP. Huang, Y. T., & Snedeker, J. (2011). Cascading activation across levels of representation in children’s lexical processing. Journal of Child Language, 38(3), ­ 644–61. ­ ​­ Johnson, S., & Somers, H. (1978). Spontaneous and imitated responses in articulation testing. British Journal of Disorders of Communication, 13(2), ­ 107–116. ­ ​­ Kehoe, M. M., & Stoel-Gammon, C. (2001). Development of syllable structure in English-speaking children with particular reference to rhymes. Journal of Child Language, 28(2), ­ 393–432. ­ ​­ Koring, L. (2013). Seemingly similar: Subjects and displacement in grammar, processing, and acquisition. Netherlands Graduate School of Linguistics. Koring, L., Mak, P., Mulders, I., & Reuland, E. (2018). Processing intransitive verbs: How do children differ from adults? Language Learning and Development, 14(1), ­ 72–94. ­ ​­ Levelt, W. (1989). ­ Speaking: From intention to articulation. The MIT Press. Lin, S., & Demuth, K. (2015). Children’s acquisition of English onset and coda /l/: Articulatory evidence. Journal of Speech Language and Hearing Research, 58, 13–27. ­ ​­

387

Titia Benders et al. Lohiniva, K., & Panizza, D. (2016). When pragmatics helps syntax: An eye tracking study on scope ambiguity resolution in ­4-to children. In BUCLD 40: Proceedings of the 40th annual university confer​­ ­5-year-old ­​­­ ​­ ence on language development (pp. ­­  ­216–228). ​­ Cascadilla Press. Lust, B., & Blume, M. (2016). Experimental tasks for generating language production data. In Research ­­  ­119–136). ​­ methods in language acquisition (pp. De Gruyter Mouton. Lust, B., Flynn, S., & Foley, C. (1996), What children know about what they say: Elicited imitation as a research method for assessing children’s syntax. MIT Press. MacWhinney, B. (2000). The CHILDES Project: Tools for analyzing talk. Transcription format and programs. Psychology Press. Mani, N., & Huettig, F. (2012). Prediction during language processing is a piece of cake—But only for skilled ­ ­347–843. ​­ producers. Journal of Experimental Psychology: Human Perception and Performance, 38(4), Meroni, L., & Crain, S. (2011). Children’s use of context in ambiguity resolution. In E. Gibson & N. Pearlmutter (Eds.), ­ The processing and acquisition of reference (pp. ­­  ­43–64). ​­ The MIT Press. Miles, K., Yuen, I., Cox, F., & Demuth, K. (2016). The prosodic licensing of coda consonants in early speech: Interactions with vowel length. Journal of Child Language, 43(2), ­ ­265–283. ​­ Millasseau, J., Bruggeman, L., Yuen, I., & Demuth, K. (2021). Temporal cues to onset voicing contrasts in ​­ ​­ Australian ­English-speaking children. The Journal of the Acoustical Society of America, 149, ­348–356. Naigles, L. (1990). Children use syntax to learn verb meanings. Journal of Child Language, 17(2), ­ ­357–374. ​­ Noiray, A., Abakarova, D., Rubertus, E., Krüger, S., & Tiede, M. (2018). How do children organize their speech in the first years of life? Insight from ultrasound imaging. Journal of Speech, Language, and Hearing Research, 61(6), ­ ­1355–1368. ​­ Omaki, A., & Lidz, J. (2015). Linking parser development to acquisition of syntactic knowledge. Language Acquisition, 22(2), ­ ­158–192. ​­ Rattanasone, N. X., Tang, P., Yuen, I., Gao, L., & Demuth, K. (2018). Five-year-olds’ acoustic realization of mandarin tone sandhi and lexical tones in context are not yet fully adult-like. Frontiers in Psychology, 9, 817. Rose, Y., & MacWhinney, B. (2014). The PhonBank Project: Data and software-assisted methods for the study of phonology and phonological development. In J. Durand, U. Gut & G. Kristoffersen (Eds.), The Oxford handbook of corpus phonology (pp. Oxford University Press. ­­  ­308–401). ​­ Ross, R. G., Radant, A. D., Young, D. A., & Hommer, D. W. (1994). Saccadic eye movements in normal children from 8 to15 years of age: A developmental study of visuospatial attention. Journal of Autism and Developmental Disorders, 24, ­413–431. ​­ Runner, J. T., & Head, K. D. (2014). What can visual world eye-tracking tell us about the binding theory. Empirical Issues in Syntax and Semantics, 10, ­269–286. ​­ Sedivy, J. C. (2010). Using eyetracking in language acquisition research. In E. Blom & S. Unsworth (Eds.), Experimental methods in language acquisition research (pp. ­­  ­115–138). ​­ John Benjamins Publishing Company. Smit, A. B. (1993). Phonologic error distributions in the Iowa-Nebraska Articulation Norms Project. Journal of Speech, Language, and Hearing Research, 36(3), ­ ­533–547. ​­ Snedeker, J., & Huang, Y. T. (2015). Sentence processing. In E. L. Bavin & L. R. Naigles (Eds.), The handbook of child language (pp. Cambridge University Press. ­­  ­409–437). ​­ Stoehr, A., Benders, T., van Hell, J. G., & Fikkert, P. (2022). Feature generalization in Dutch–German bilingual and monolingual children’s speech production. First Language, 42(1), ­ ­101–123. ​­ Swingley, D. (2016). Two-year-olds interpret novel phonological neighbors as familiar words. Developmental Psychology, 52(7), ­ ­1011–1023. ​­ Tanenhaus, M. K., Spivey-Knowlton, M. J., Eberhard, K. M., & Sedivy, J. C. (1995). Integration of visual and linguistic information in spoken language comprehension. Science, 268, ­1632–1634. ​­ Tang, P., Yuen, I., Rattanasone, N. X., Gao, L., & Demuth, K. (2019). The acquisition of phonological alternations: The case of the Mandarin tone sandhi process. Applied Psycholinguistics, 40(6), ­ ­1495–1526. ​­ Thornton, R. (1996). Elicited production. In D. McDaniel, C. McKee & H. Smith Cairns (Eds.), Methods for assessing children’s syntax (pp. MIT Press. ­­  ­77–102). ​­ Thornton, R. (2017). The truth value judgment task. In M. Nakayama, Y. Su & A. Huang (Eds.), Studies in ­­  ­13–39). ​­ Japanese and Chinese Language Acquisition: In honor of Stephen Crain (pp. John Benjamins Publishing Company. Thornton, R. (2021). Judgments of acceptability, truth, and felicity in child language. In G. Goodall (Ed.), The Cambridge handbook of experimental syntax (pp. ­­  ­394–420). ​­ Cambridge University Press. Thornton, R., & Wexler, K. (1999). Principle B, VP ellipsis, and interpretation in child grammar. MIT Press.

388

Experimental methods to study child language Trueswell, J. C. (2008). Using eye movements as a developmental measure within psycholinguistics. In E. M. F. Irina, A. Sekerina & H. Clahsen (Eds.), Developmental psycholinguistics: On-line methods in children’s language processing (pp. 73–96). John Benjamins Publishing Company. Trueswell, J. C., Sekerina, I., Hill, N. M., & Logrip, M. L. (1999). The kindergarten-path effect: Studying ​­ on-line sentence processing in young children. Cognition, 73, ­89–134. Xu Rattanasone, N., & Demuth, K. (2022). Produced, but not ‘productive’: Mandarin-speaking pre-schoolers’ challenges acquiring L2 English plural morphology. Journal of Child Language, 50(3), 581–609. Yee, E., & Sedivy, J. C. (2006). Eye movements to pictures reveal transient semantic activation during spoken ­ ­1–14. ​­ word recognition. Journal of Experimental Psychology: Learning, Memory, and Cognition, 32(1), Zamuner, T. S., Gerken, L., & Hammond, M. (2004). Phonotactic probabilities in young children’s speech ­ ­515–536. ​­ production. Journal of Child Language, 31(3),

389

25 EXPERIMENTAL METHODS TO STUDY ATYPICAL LANGUAGE DEVELOPMENT Phaedra Royle, Émilie Courteau and Marie Pourquié

25.1

Introduction

25.1.1  Definition of DLD The diagnostic criteria for children with language disorder in the DSM-V (American Psychiatric Association, 2013) include early onset of symptoms and persistent difficulties in language acquisition caused by comprehension or production deficits. These are characterized by a reduced vocabulary, limited sentence structures, and discourse impairments. Those language deficits are not the result of sensory, motor impairments, or global delay, and will result in functional limitations in many areas, including social participation and academic achievement. The new developmental language disorder (DLD) label suggested by Bishop and colleagues (2017) aligns with the DSM-V definition while adding that a DLD diagnostic should result in functional impairments. Both agree that language disorders diagnosed at the age of four or five years usually persist into adulthood. The DSM-V specifies that the profile of language strengths and weaknesses is likely to change over a child’s development. The previously used label “specific language impairment” (SLI), which became widely used in the 1980s (Reilly et al., 2014), was recently replaced by the label DLD (Bishop et al., 2017). SLI referred to children with language disorders whose cognitive abilities were within normal limits and for whom there was no discernible reason for the language disorder. However, causes for language disorders are multifactorial (Bishop et al., 2017), and thus nonverbal skills within normal limits are no longer included as diagnostic criteria for DLD. Volkers (2018) noted that some consider SLI to be a subcategory of DLD, where SLI includes children without nonverbal impairments. Others have also considered that the main difference between both labels really differs in the extent to “which identification depends upon functional impacts” (McGregor et al., 2020, p. 38).

25.1.2

Disorders with atypical language development

If the language deficits occur together with a known biomedical condition and are thus part of a more complex pattern of impairment, this condition is called a differentiating condition. In this case, the label “language disorder associated with X” is used, where X is the known biomedical DOI: 10.4324/9781003392972-29 390

Experimental methods to study atypical language development

condition (Bishop et al., 2017). These differentiating conditions include intellectual disability and autism spectrum disorder (ASD). While the primary impairments expected in ASD are social deficits, this disorder is heterogeneous and is associated with a wide range of cognitive and language diabilities (Georgiou & Spanoudis, 2021). As a result, performance on language tasks may sometimes be similar between participants with DLD or ASD (ibid). Developmental dyslexia (DD), i.e., a disorder that impairs automatization of the reading and writing system (Ziegler et al., 2008) is often comorbid with DLD. While DD includes many subtypes (ibid), phonological deficits are often part of the disorder, which can impact oral language skills. As the causal relationship between this disorder and DLD is unclear, it is considered as a potential co-occurring condition (Bishop et al., 2017).

25.2

Historical perspectives

The first published works on developmental language impairment date from the 1800s with Gall (1822, in Leonard, 2014). Initial reports were provided by neuroscientists or psychologists presenting case reports of children with seemingly normal cognitive abilities and concurrent language learning deficits. In the mid-1900s, research on DLD focused on defining the impairment and establishing (a) the existence of a language learning deficit in the absence of cognitive, neurological, or environmental causes and (b) the etiology of language learning impairments (Ingram, 1959). It is now believed that around 7% of the general population presents with a language production or comprehension deficit (Tomblin et al., 1997). However, the etiology of DLD remains a debate as various genes have been suggested to be the root cause of the impairment. Furthermore, different genetic mutations could result in similar linguistic manifestations, but also a mutation in the same gene could have different consequences depending on the speaker (Bishop & Snowling, 2004; Bishop et al., 2006). Linguistic and acquisition research on DLD began in earnest in the 1980s with a specific focus on structures that proved to be difficult for children with the disorder. Generally, it has been found that children with DLD will present with delayed word learning and phonological development, as well as impoverished syntactic structures and morphosyntax as compared to their typically developing peers (Leonard, 2014). Older children with DLD will usually have good lexical-semantic abilities in comparison to their morphosyntactic abilities, and phonological difficulties resolve, at least in part, before they become teens (Courteau et al., 2023). As they mature, morphosyntactic abilities remain impaired, and pragmatic difficulties can emerge (Fujiki & Brinton, 2014). However, research has largely focused on morphosyntactic abilities, as these are quite prevalent in DLD across languages and ages. These include difficulties producing tense marking, number agreement on verbs, nouns and determiners, gender agreement on determiners, clitics and adjectives, case-marking, and so on. Manifestations of these difficulties vary from language to language, and the initial focus on monolingual English-speaking children for the bulk of research in DLD has resulted in an Anglocentric and monolingual theoretical approach to DLD. We address multilingualism in the critical issues (Section 25.3.1) next and return to cross-linguistic issues below in the current contributions (Section 25.4.2).

25.3

Critical issues and topics

25.3.1

Multilingualism

Multilingualism, that is using several languages on a regular basis, is not to be considered as an exception but rather as the rule: it is estimated that worldwide multilinguals represent at least 50% of 391

Phaedra Royle et al.

the population (Grosjean, 2021). The number of people with DLD who are multilingual is increasing. However, it is well established that multilingualism is not what causes such disorders (Paradis et al., 2008). The challenge for research on multilingual children lies in disentangling typical from atypical language processing, as multilingual children may also present with linguistic weaknesses such as lexical access difficulties, agreement errors, and reduced syntactic complexity. It is thus crucial to have access to reliable tools in multiple languages to identify and study DLD in multilingual children. Research on the comparison of multilingual children with and without DLD is also highly relevant to avoid under and over identification of DLD in clinical settings. Interestingly, studies show that longer exposure to a second language (L2) in school predicts better performance for typical development (TD) bilinguals but not for bilinguals with DLD (Altman et al., 2016; Blom & Paradis, 2015). Comprehension tasks have been shown to be reliable in distinguishing children with and without DLD in multilingual settings (Elin Thordardottir & Brandeker, 2013). However, comprehension tasks can be prone to Type 1 errors (see Section 25.5.5 on comprehension tasks and Section 25.6.1 on Type 1 errors). Recent tools have been developed to characterize DLD markers in multilingual settings. The Language Impairment Testing in Multilingual Settings battery (LITMUS, et  al., ­ ­Armon-Lotem ​­ 2015) includes several tasks known to identify DLD that have been designed for multiple and diverse languages including non-Indo-European ones: sentence repetition, multilingual assessment in narratives, cross-linguistic lexical tasks, nonword repetition, and a parental bilingual questionnaire. These allow researchers not only to enhance cross-language comparisons but also to assess multilinguals in their multiple languages (see https://www.bi-sli.org/litmus-tools). ­ ­ ­ ​­ ­­ ​­ The sentence repetition task is a good tool to disentangle DLD from grammatical weaknesses that characterize multilingual speakers (see Section 25.5.3). Nonword repetition is equally important as it usually reveals a similar performance between multilingual and monolingual speakers without language impairment as long as stimuli do not involve language-specific phonemes. Furthermore, this task identifies both monolingual and multilingual children with DLD and research reveals no bilingualism effects and differentiation between children with and without DLD who are bilingual. Research also supports the influence and importance of phonological complexity on language processing in children with DLD (de Almeida et al., 2018).

25.3.2

Cognitive assessment: Language and beyond

Historically, definitions of SLI, for both research and clinical purposes, were predicated on observable differences between language abilities and nonverbal cognitive abilities: nonverbal scores were expected to be within normal limits. This approach however had its issues. For one, some putatively nonverbal cognitive assessment tasks are more verbal than others (Durant et al., 2019) and they can promote implicit verbal routines (Botting et al., 2013). Second, depending on the task used, children could remain in or be excluded from the SLI group (Miller & Gilbert, 2008) because they were reclassified as having low IQ, an exclusionary criterion for SLI. This has an impact not only on language rehabilitation and health services (Reilly et al., 2014) but also on how representative a body of research on that population may be, because nonverbal cognitive abilities can be low on average or even below the normal range, depending on how you measure them. Following CATALISE (Bishop et al., 2017) cognitive assessments are no longer used to classify children as having a language impairment, as long as the language deficit is not associated with a known biomedical condition (e.g., ASD or intellectual disability). They can, however, offer us insight into how linguistic and cognitive abilities interact. For example, declarative verbal memory impairments are specifically linked to working memory deficits in a subgroup of children 392

Experimental methods to study atypical language development

with DLD (Lum et al., 2015) and bilingual children with DLD may exhibit unequal deficits in nonverbal cognitive skills across tasks, i.e., no deficits are observed on pattern recognition (a cube design task), but they are found on symbolic memory (reproducing picture in different colours in a specific order), supporting the notion that it is difficult to correlate verbal and nonverbal abilities in this population (Durant et al., 2019).

25.3.3

Control group matching

25.3.3.1 Age and language matching Research in DLD often relies on group comparisons, most often with children who have TD but occasionally with other groups, such as those with DD (Rispens & Been, 2007) or those with ASD (Tuller et al., 2017). However, even within TD groups, there are often two matches: a first on age and a second on some language measure. Language-matching trails the following logic: given that children with DLD present with a language development delay, comparing them to age-matched individuals will always result in group differences. Comparing them to language-matched peers can highlight differences in linguistic abilities that are beyond language delay, i.e., indicative of acquisition patterns that deviate from typical language development. Language matching can be done on various measures. Often, receptive vocabulary tests are used as a proxy for linguistic development, but this is sub-optimal. Sentence repetition tasks are robust indicators of language impairment (Courteau et al. 2023; Elin Thordardottir & Brandeker, 2013) and are often used to confirm language impairment or to match groups. Fuller measures such as mean length of utterance (MLU), a quasi-syntactic measure of development, are sometimes used. However, when using these richer measures, matching often becomes difficult as control participants can be quite young and not yet have developed the ability to respond to task demands (Royle & Elin Thordardottir, 2008). Note that even when comparing children with DLD to age-matched peers, one can observe interesting qualitative differences between groups. This is especially salient with error patterns. TD children produce errors commensurate with the grammar (e.g., overregularization in English, Ullman & Gopnik, 1999; Paradis et al., 2008) or with automatization (e.g., attraction effects in French, Franck et al., 2004), while participants with DLD will elicit atypical error patterns (e.g., overuse of the non-default feminine gender in French, Royle & Reising, 2019, or non-application of morpho-phonological processes, Royle & Stine, 2013).

25.3.3.2

Group matching in neurolinguistic experiments

Language matching is especially problematic if one is comparing neurolinguistic processing between groups, since we know that brain changes, such as myelination, are ongoing during childhood and up to young adulthood (Segalowitz et al., 2010). The available data on brain maturation show important changes in neurotypical children’s event-related brain potentials (ERPs) through grade school and beyond. It is, therefore, unwise to use only language-matched controls in ERP experiments, as differences observed between impaired and unimpaired groups could simply be linked to maturation effects on brain organization and specialization. Recently, we have observed that if we matched our participants in an ERP on sentence repetition, our DLD group of 14-year­olds would be compared to seven­ ​­ or ­eight-year-olds, ­​­­ ​­ which is ­sub-optimal ​­ for a neuroimaging study. 393

Phaedra Royle et al.

25.4

Current contributions and research 25.4.1

Neuroimaging

This points, however, to an exciting new avenue of research: neuroimaging of language processing in DLD. Using electroencephalograms (EEG), one can obtain millisecond-by-millisecond recordings of online processing as language unfolds. From the EEG one extracts ERPs to establish whether participants are sensitive to grammatical errors (Cantiani et al., 2015) or incongruencies (Courteau et al., 2023). However, few languages or structures have been studied using this method, and still too few studies have focused on morphosyntactic and syntactic processing in typical language development (see Royle & Courteau, 2014 for a review). Open questions about the cognitive underpinnings of DLD that can be addressed using this method range from timing or auditory processing deficits (Kail, 1994; Tallal et al., 1981) to dissociations between domains within language and between language processing models (e.g., Hickok & Poeppel, 2007; Ullman & Pierpont, 2005). An important caveat about neurolinguistic studies of language processing is that multidisciplinary teams are necessary for this type of research. Issues common in classic psycholinguistic research (inappropriate stimuli for participants with DLD, inappropriate questions for linguistic research, experimental designs that do not directly address the question asked) are also present in ERP and other neuroimaging research. Furthermore, difficulties understanding brain imaging methods and data analyses can also result in uninterpretable data. In ERP research, ungrammatical sentences are often used to tap into language processing. However, contrary to psycholinguistics experiments (excepting eye-tracking) ungrammaticality effects are measured during stimulus presentation and not simply at the end of a sentence. In ERPs, effects are expected to be observed directly on the error when the ERP is analysed: if the sentence is not in fact ungrammatical at that point, analyses are difficult to interpret (see Royle & Courteau, 2014, for examples). Another issue is presentation modality. Because children with DLD can also present with reading or writing impairments and writing limits the lower ages at which we can test children, auditory stimuli should be preferred. However, auditory sentences contain subtle cues (sentence-initial vowel lengthening, intonation accent on errors, or even abrupt changes in the intonation phrase due to splicing). These cues are known to affect ERP patterns and might even be the direct cause for some components (Steinhauer & Drury, 2012). Another advantage in using ERPs in establishing a defining characteristic of DLD is that we can compare neurocognitive patterns in children with DLD to those of other groups with neurodevelopmental disorders. For example, while ERPs in response to lexical-semantic mismatches appear typical in children and adolescents with DLD, there is mixed evidence regarding children with ASD (e.g., Manfredi et al., 2020).

25.4.2 ­Cross-linguistic ​­ studies of DLD In DLD the main combinatorial linguistic components such as phonology, morphosyntax, and syntax are problematic. Since these components manifest differently from language to language, one might expect that DLD symptomatology to be highly variable across languages. Indeed, DLD manifestations are constrained by language-specific parameters. Language-family related (or language family related) symptoms have been reported, such as difficulties in processing object pronoun clitics in Romance languages, difficulties in processing verb 2nd movement in Germanic languages, and underuse of aspect markers in Chinese languages (Leonard, 2013). 394

Experimental methods to study atypical language development

­

395

Phaedra Royle et al.

questions and prompts). Spontaneous speech samples can be compared to samples from typically developing children, already collected and available in repositories such as CHILDES and TalkBank (https://childes.talkbank.org, MacWhinney, 2000; Rose & MacWhinney, 2014). Spontaneous speech can, however, have drawbacks. Data can be time-consuming to transcribe and analyse, although there are programs for semi-automatic morphological coding in CHAT/CLAN (MacWhinney, 2000). Spontaneous speech data can also underestimate or overestimate linguistic abilities. For example, it has been shown that complex syntactic structures are better studied in elicitation tasks than spontaneous speech (Steel et al., 2013), and fewer errors on morphosyntactic agreement can be found in spontaneous speech compared to elicitation (Royle & Reising, 2019). Furthermore, the communicative context may not demand or encourage targeted structures.

25.5.2

Elicitation tasks

A solution is to use elicitation tasks. These probe linguistic structures that are potential domains of weakness in children with language impairment. In this way, one can assess mastery levels and upper limits in children. Elicitation can take many forms. Participants can name pictures, describe events, complete sentences, and respond to questions that are structured to elicit targeted structures (e.g., “What did Kermit do yesterday?” to elicit the past tense, or “This is a wug, these are two __” to elicit plural agreement). Story retelling resembles spontaneous speech but is more constrained and is usually categorized as an elicitation task, as there is a significant amount of priming for all aspects of language, and constraints are much higher on what the child is expected to say than in spontaneous speech. A method often used in research and clinical settings is word naming or elicitation tasks. In addition to evaluating the breadth or depth of lexical knowledge, these are often used to match participant groups or to categorize children as being language disordered or not, and are sometimes used as proxies for global linguistic knowledge. Breadth of lexical knowledge is assessed using, e.g., picture-naming tasks. Short videos or animations are occasionally used for verb naming. Depth of knowledge can be evaluated using oral categorization tasks (e.g., “Which words go together?”) or card sorting. Further semantic and grammatical information (i.e., part of speech, verb, noun) can be gathered by asking children for word definitions or how they would use them in a sentence. Elicitation tasks do not, however, provide a global picture of language development. Using only elicitation, it would take too much time to obtain a full portrait of a child’s development. This is important because children with language impairment might have relative strengths and weaknesses that are not highlighted by specific tasks. This issue is valid for all the methods we present in the following sections.

25.5.3

Sentence repetition

A specific subcategory of elicitation tasks is sentence repetition. Leclercq et al. (2014) suggest that the ability to repeat sentences accurately is subserved by two factors: a (morpho-)syntactic factor and a lexical one. They found that both factors contributed almost equally to scores on a sentence recall task: 52.56% of the variance was explained by morphosyntax, and 43.92% was explained by the lexicon. Sentence repetition tasks have been shown to discriminate between children and teenagers with TD and DLD in many languages including English (Conti-Ramsden et al., 2001) and French (Elin Thordardottir et al., 2011; Leclercq et al., 2014). This task can also discriminate 396

Experimental methods to study atypical language development

between different types of atypical language development (e.g., DLD and ASD, Sukenik & Friedmann, 2018, however, see Silleresi et al., 2018 for conflicting results). In a study on Palestinian Arabic (Taha et al., 2021) observed that most grammatical errors made by children with DLD resemble those made by TD children, but that they are more frequent. Also, despite large similarities in error types between the two groups, some atypical errors were exclusively produced by the DLD group. For example, verb omission, or substitution of the singular verb for the plural form (e.g., [ʃirbib], drink-PAST-3MS ‘he drank’ for [ʃirbu], drink-PAST-3PL, ­ ​­ omission which results in changing the sentence to active voice, ‘they drank’), passive prefix inand finally production of fragmented syntax due to multiple omissions. Such observations are highly important as they provide qualitative benchmarks for the study of DLD.

25.5.4

Grammaticality judgment

The grammaticality judgment task is another device used to probe language abilities. Although some might think that grammaticality judgments are hard to elicit in young children, this is not always the case. For example, Crain and Thornton (2000) have shown one can use truth-value judgment tasks (TVJT) to tap into grammatical knowledge by evaluating what meaning a child assigns to a given sentence. In this approach, the child will hear a sentence produced by a puppet. For example, if Kermit says, “Only Peter Rabbit will eat a carrot or a pepper”, and Peter Rabbit eats a carrot, an English-speaking child aged 3;6 will accept this sentence as true. The child will reject this as false if Cookie Monster eats a carrot, showing understanding of both the disjunctive reading of “or” and the scope of “only” (Crain & Thornton, 2006). Some advantages to this approach are that the child does not feel tested, ambiguous sentences can be probed, and many types of structures and levels of complexity can be studied. Although most studies have used TVJT for semantics and syntax, they can be used for other linguistic domains, such as phoneme perception (Rvachew et al., 2017). Other approaches, such as the alien-learner paradigm, have been used to probe sentence or agreement processing in typically developing and language-impaired children. The child can “feed” the alien with “food” when they produce a grammatical sentence, providing positive reinforcement, and avoiding negative responses (Labelle & Valois, 2003). Another way to elicit positive responses for wrong answers is to ask if the sentence sounds “weird” (Courteau et al., 2013). In this approach, it is also possible to probe children with follow-up questions such as “Why?”, “How would you say it?” etc., thus ensuring that the reason why a child has responded in a certain way is explicit (or, if the child cannot explain why, they might be able to model the correct sentence). However, one must pay attention to task design as grammaticality judgments are highly prone to Type 1 errors, that is measuring something else than what was supposed to be measured and leading to the conclusion that the initial hypothesis is true. For example, comprehension difficulties might lead children to interpret a sentence as wrong for the wrong reasons. An example of this is a case where the sentence “The cat eats mouse” was judged to be wrong by a child with DLD because of its semantic content. She responded that “Cats eat little balls” i.e., cat food (Rose & Royle, 1999). Type 1 errors are also the bane of comprehension tasks, to which we now turn.

25.5.5

Language comprehension versus production tasks

The use of both production and comprehension tasks to study and assess DLD is highly relevant as this may help us better comprehend what the underlying linguistic deficits are and differentiate 397

Phaedra Royle et al.

between language profiles of bilinguals and children with DLD. Indeed, while the former groups show difficulties in language production despite good comprehension and grammatical judgment, the latter exhibit production difficulties and impaired comprehension and grammatical judgment. Chondrogianni and colleagues (2015) claim that L2 learners’ problems with grammatical morphology are output-related and do not reflect impaired underlying grammatical representations. Production difficulties in bilinguals could be caused by lexical access and retrieval difficulties (Bialystok et al., 2008), prosodic differences between languages (Goad & White, 2006), lack of automaticity, or a combination of these. This then creates an expressive-receptive “gap”, that is an asymmetry between low production and higher comprehension skills in bilingual children. This gap varies according to the amount of exposure, irrespective of the language family: lexical gaps have been reported in many studies (Gibson et al., 2014). A “grammatical gap” – better performance in morphosyntactic comprehension than production – has also been reported in bilingual children (Anderson et al., 2019; Pourquié et al., 2019) and bilinguals perform on par with the monolinguals in comprehension but not production tasks (Pratt et al., 2021). This is a good testing ground to disentangle DLD from bilinguals’ impaired grammatical production because, in children with DLD, comprehension seems to be more problematic than in typically developing bilinguals. The importance of evaluating comprehension skills in children with DLD cannot be overstated: online comprehension studies can provide us with a window into the underlying representations and processing routines of language learners (Chondrogianni et al., 2015). One can employ various techniques to assess comprehension using “online” methods such as eye-tracking, and ERPs, or with the aid of off-line tasks such as sentence comprehension, sentence-picture matching, or TVJTs. Whatever the technique used it is essential to use adequate stimuli. Since comprehension involves cognitive skills that go beyond linguistic processing, such as vision and audition. Targeting linguistic comprehension requires controlling the linguistic features of stimuli, such as for instance sentence complexity. More importantly, comprehension tasks targeting specific features must avoid extra linguistic cues beyond the ones being tested. For instance, the fLEX sentence comprehension task assesses both lexical and morphosyntatic verb processing while avoiding external cues that could be triggered by subject pronouns or phonological liaison between the subject and the verb that exists in French (Pourquié, 2017). It can be the case that a task developed to test comprehension does not in fact test the feature it was designed to probe. For example, Roulet-Amiot and Jakubowicz (2006) probed sensitivity to gender agreement by asking children to make semantic categorizations (e.g., “something you can wear”) and presented them nouns with appropriate and inappropriate gendered determiners or adjectives. Children with DLD showed difficulties on the task but were not affected by gender errors. The authors concluded that children with DLD did not have any difficulties processing gender. However, one could argue that the comprehension task could easily be carried out without agreement checking, and that the ability to process and check gender features would have in fact slowed down processing. Thus, is it hard to interpret results from this task as data for (or against) a gender-processing deficits in French-speaking children with DLD. These types of situations can give rise to Type 1 errors, which return to below in Section 25.6.1.

25.5.6

Language assessment within subdomains

One must not forget that language has multiple subdomains i.e., it is multidimensional (Lonigan & Milburn, 2017). Subdomains are often evaluated separately, to the extent that it is possible. Tomblin and Zhang (2006) give an example of how language is assessed through commercially available 398

Experimental methods to study atypical language development

batteries: domains such as grammar and vocabulary will be evaluated by different tasks in the receptive and expressive modalities, assuming that subdomains can be impaired or preserved in any individual domain (e.g., preserved receptive syntax versus impaired expressive vocabulary). Recently, challenging the assumption of language’s multidimensionality, language acquisition has been studied through linguistic assessment tasks with confirmatory factor analyses that allow researchers to confirm if the studied constructs are distinct, and to validate if they have empirical foundations (ibid). In children with and without DLD, there is evidence that language’s multidimensionality increases with age. Tomblin and Zhang’s (2006) longitudinal study of 1,929 children ​­ with and without DLD showed that it is valid to consider vocabulary–​­assessed with ­word-level tasks – and grammar – reflected by sentence-level tasks – as two separate dimensions starting in second grade. Lonigan and Milburn (2017) also found that vocabulary and syntax were two dimensions starting in preschool, but that they nonetheless shared a lot of variances. Interestingly, both these studies failed to find evidence supporting the idea that language comprehension and production skills are different dimensions. While grammatical (morpho-)syntactic deficits have been widely investigated in DLD with a variety of experimental methods, less attention has been paid to lexico-semantic deficits. For instance, although Conti-Ramsden et al. (2001) administered vocabulary tasks to their participants, results on these were not included in their diagnostic accuracy analyses. However, studies show impairments on lexical tasks. McGregor et al. (2013) assessed children and teenagers with DLD on their vocabulary breadth and depth. Children with DLD showed deficits on both measures throughout all age groups. Impairments were also found on receptive vocabulary. Using a pictureword matching test in a longitudinal study from ages 2;6–21 years, Rice and Hoffman (2015) found poorer performance for participants with DLD compared to TL across the study. In a recent study, lexico-semantic relationship tests had an outstanding diagnostic accuracy to discriminate between French-speaking teenagers with and without DLD (Courteau et al. 2023). Impairments in phonological working memory – in the sense of “a limited capacity system allowing the temporary storage and manipulation of information” as defined by Baddeley (2000, p. 418) – have been observed in teenagers with DLD. Using forward and backward digit span tasks, Arslan et al. (2020) found impaired phonological working memory skills in French-speaking DLD children and teenagers when compared to age-matched controls. Interestingly, there wasn’t any difference between the teen groups on visuospatial working memory skills, but the younger DLD group showed significantly lower performance than their aged-matched TL peers on one visuospatial test, suggesting that visuospatial skills can normalize with age.

25.6

Recommendation for future studies

25.6.1 Avoid developing experiments that result in false positive results When designing linguistic experiments, one must take care that we are in fact clearly testing our hypothesis. It could be that the results obtained give us the erroneous impression that we have proved our hypothesis whereas in fact the null hypothesis is true. Crain and Thornton (2000) recommend that one should stack the cards in experiments against the null hypothesis to avoid children’s responses being right for the wrong reason. In some cases, it might be impossible to test a given feature without “stacking the deck” because of the linguistic properties under investigation. In this case, extreme caution should be taken in data interpretation. One solution to this problem is to develop more than one type of experiment addressing the question at hand. Another solution 399

Phaedra Royle et al.

is to use an interdisciplinary approach to address multiple parameters (linguistic, psycholinguistic, sociological, and clinical) that may come into play.

25.6.2

Use multiple sources of information

We reviewed several methods with their advantages and disadvantages typically used to study language acquisition and processing in participants with DLD. A common-sense approach to circumventing many issues is to use multiple tasks, for example, combining comprehension and production tasks, or using spontaneous speech and elicitation probes, to obtain richer information sources on linguistic abilities in participants. This approach allows for more nuanced interpretation and a better understanding of linguistic deficits and strengths in participants. The choice of tasks to be used is obviously constrained by research questions and hypotheses, but also by time. Younger children typically need breaks every 30 minutes and might not be willing to stay in the lab for more than two hours. A specific if time-consuming task that can provide complementary information to more classic behavioural tasks is neuroimaging. Neurocognitive investigations using ERPs have the distinct advantage of allowing us to compare language processing abilities in people with DLD across lexical-semantic and morphosyntactic processing. Children and teenagers with DLD consistently exhibit the N400 component – a component typically linked to lexico-semantic processing, as expected, when processing lexicosemantics (see Royle & Courteau, 2014, for a review, and Courteau et al. 2023). However, ERPs for morphosyntactic processing tend to be different from their peers (ibid), suggesting impairments in this domain. To date, very few aspects and languages have been studied using this technique. For example, at the word level, lexico-sematic incongruency paradigms dominate the field, often in the form of word-picture matching paradigms. Furthermore, ERPs have not yet been used to investigate linguistic maturation in DLD, as has been done in second language learning research (see Steinhauer, 2014, for a review).

25.6.3

Explore the language learning continuum: Go beyond the school years

In fact, there are relatively few language development studies of people with DLD after grade school. There is a sizable amount of work on educational and social outcomes in teens and adults with DLD (e.g., Durkin & Conti-Ramsden, 2010; Conti-Ramsden et al., 2018), but much less on linguistic attainment, and the majority, if not all, focus on monolingual English speakers. One could ask, what the target grammar is for a person with DLD? This is not a trite question, as it is intimately linked to theories and our understanding of DLD. However, one challenge in interpreting these results is that for most of the studies, child and adolescent control groups do not present the expected adult patterns associated with mature morphosyntactic processing. This is especially important for ERP research where, again, data on native adult speakers are not available outside a few, mostly Indo-European, languages. Future studies should test adults with DLD to determine whether typical morphosyntactic processing patterns are present, thus providing evidence for maintenance or impairment as a defining feature of DLD.

25.6.4 Align research with needs As we just mentioned, some research on DLD focuses on outcomes and attainment. This type of research is not only important for educators and policymakers but also for people with DLD, who

400

Experimental methods to study atypical language development

do not always feel that research reflects their needs. A growing trend in medicine is the “patient partner” concept where patients actively participate in experiment design, not only as “subjects”. In anthropological linguistics and language revitalization work similar partnerships are becoming the norm, and in some cases obligatory. With this in mind, researchers in the domain DLD should expect to integrate the needs of the DLD community and persons with DLD as partners in their future research. A recent paper outlines research that persons with DLD would actually like to be investigated (Kulkarni et al., 2022) with the caveat that respondents in this study were from GreatBritain and might not represent the full array of needs of all people with DLD. Some of these can and should be pursued by language specialists. For example, suggestions outline needs for teacher training and interventions focusing on speech, language, and communication-related goals, as well as receptive language skills. Psycholinguists should be invested in this type of research to avoid Type 1 errors discussed above.

25.7

Future directions

Breaking out of the tradition of working on monolingual and mostly Indo-European languages will not only allow us to better understand DLD but also to account for the varieties of linguistic experiences that are prevalent in our multilingual and increasingly immigrant-rich cultures. Furthermore, as language is learned under diverse conditions, one can question whether monolinguals provide the appropriate benchmark for all learners, be they with or without DLD. One approach that considers diversity in learning conditions uses multi-group comparisons, and groups other than typically developing monolinguals, that include diverse learners and learning contexts, placing immigration, integration, and adaptation front and centre. As mentioned, despite more than 40 years of research, few studies focus on language attainment in adults with DLD. Future studies should explore this area. Domains that could be evaluated are pragmatics, reading and writing and use of language in work settings, in addition to higher-level syntax and logic, domains that are rarely investigated in young children but that are important for adults. This work will help us better understand language learning as a lifelong process in DLD, but also allow for better linguistic and social integration in persons with DLD. As researchers, we often justify our studies simply by stating that science is intrinsically interesting, valuable, and important (and cite the habitual example of how GPS would not exist without Einstein’s ideas). But an important issue regarding research on DLD is that it is an almost invisible domain of inquiry, if you compare it to other developmental disorders such as ADHD or ASD (Bishop, 2010). As Bishop notes, “when prevalence is taken into account, the number of publications on rare conditions is greatly in excess of that for common conditions” and this is linked to less funding being awarded to less severe conditions, of which DLD is a member, even though social impacts of DLD are significant. Furthermore, research disciplines (medicine, genetics, psychology, linguistics, and speech-language pathology) have different funding opportunities that greatly affect the amount of research in their respective domains. It is thus not an easy task for researchers interested in DLD to convince decision makers to fund their research. We can suggest two approaches that might help resolve this issue. First, working in interdisciplinary teams to obtain funding for research that is grounded in clear linguistic descriptions and informed testing of language abilities, while expanding linguistics’ imprint on health and other sciences interested in DLD (e.g., genetics and neuroimaging). Second, working with the DLD community to promote research that not only helps us better understand what DLD is, but also has a positive impact on their lives.

401

Phaedra Royle et al.

Further reading Crain, S., & Thornton, R. (2000). Investigations in universal grammar: A guide to experiments on the acquisition of syntax and semantics. MIT Press. Leonard, L. B. (2014). Children with specific language impairment (2nd ed.). MIT Press. Lonigan, C. J., & Milburn, T. F. (2017). Identifying the dimensionality of oral language skills of children with typical development in preschool through fifth grade. Journal of Speech, Language, and Hearing Research, 60(8), ­ ­2185–2198. ​­ Royle, P., & Courteau, E. (2014). Language processing in children with specific language impairment: A review of event-related potential studies. In L. T. Klein & V. Amato (Eds.), Language processing: New ­­  ­33–64). ​­ research (pp. Nova Science Publishers.

Related topics Eliciting spontaneous linguistic productions; new directions in statistical analysis for experimental linguistics; experimental methods to study child language; experimental methods to study disorders of language production in adults

References Altman, C., Armon-Lotem, S., Fichman, S., & Walters, J. (2016). Macrostructure, microstructure, and mental state terms in the narratives of English-Hebrew bilingual preschool children with and without specific ­ 165–193. ­ ​­ language impairment. Applied Psycholinguistics, 37(1), American Psychiatric Association (2013). Diagnostic and statistical manual of mental disorders DSM-5 (5th ­ ed.). American Psychiatric Publishing. Anderson, R. M., Giezen, M., & Pourquié, M. (2019). Basque-Spanish bilingual children’s expressive and receptive grammatical abilities. Linguistic Approaches to Bilingualism, 9(4–5), ­­ ​­ ­687–709. ​­ Armon-Lotem, S., de Jong, J., & Meir, N. (2015). Assessing multilingual children: Disentangling bilingualism from language impairment. Multilingual Matters. Arslan, S., Broc, L., Olive, T., & Mathy, F. (2020). Reduced deficits observed in children and adolescents with developmental language disorder using proper nonverbalizable span tasks. Research in Developmental Disabilities, 96, 103522. Baddeley, A. (2000). The episodic buffer: A new component of working memory? Trends in Cognitive Sciences, 4(11), ­ ­417–423. ​­ Bialystok, E., Craik, F. I. M., & Luk, G. (2008). Lexical access in bilinguals: Effects of vocabulary size and executive control. Journal of Neurolinguistics, 21, ­522–538. ​­ Bishop, D. V. M. (2010). Which neurodevelopmental disorders get researched and why? PLoS ONE, 5(11), ­ e15112. Bishop, D. V., Adams, C., & Norbury, C. F. (2006). Distinct genetic influences on grammar and phonological short-term memory deficits: Evidence from 6-year-old twins. Genes, Brain and Behavior, 5, ­158–169. ​­ Bishop, D. V., & Snowling, M. J. (2004). Developmental dyslexia and specific language impairment: Same or different? Psychological Bulletin, 130(6), ­ ­858–886. ​­ Bishop, D. V., Snowling, M. J., Thompson, P. A., Greenhalgh, T., & Catalise-2 Consortium (2017). Phase 2 of CATALISE: A multinational and multidisciplinary Delphi consensus study of problems with language development: Terminology. Journal of Child Psychology and Psychiatry, 58(10), ­ ­1068–1080. ​­ Blom, E., & Paradis, J. (2015). Sources of individual differences in the acquisition of tense inflection by English second language learners with and without specific language impairment. Applied Psycholinguistics, 36, ­953–976. ​­ Botting, N., Psarou, P., Caplin, T., & Nevin, L. (2013). Short-term memory skills in children with specific language impairment. Topics in Language Disorders, 33(4), ­ ­313–327 ​­ Cantiani, C., Lorusso, M. L., Perego, P., Molteni, M., & Guasti, M. T. (2015). Developmental dyslexia with and without language impairment: ERPs reveal qualitative differences in morphosyntactic processing. Developmental Neuropsychology, 40(5), ­ ­291–312. ​­

402

Experimental methods to study atypical language development Chondrogianni, V., Marinis, T., Edwards, S., & Blom, E. (2015). Production and on-line comprehension of definite articles and clitic pronouns by Greek sequential bilingual children and monolingual children with specific language impairment. Applied Psycholinguistics, 36(5), ­ 1155–1191. ­ ​­ Conti-Ramsden, G., Botting, N., & Faragher, B. (2001). Psycholinguistic markers for specific language im­ ­ 741–748. ­ ​­ pairment (SLI). The Journal of Child Psychology and Psychiatry and Allied Disciplines, 42(6), Conti-Ramsden, G., Durkin, K., Toseeb, U., Botting, N., & Pickles, A. (2018). Education and employment outcomes of young adults with a history of developmental language disorder. International Journal of ­ 237–255. ­ ​­ Language & Communication Disorders, 53(2), Courteau, É., Loignon, G., Steinhauer, K., & Royle, P. (2023). Identifying linguistic markers of Frenchspeaking teenagers with developmental language disorder: Which tasks matter? Journal of Speech, Language, and Hearing Research, 66(1), ­ 221–238. ­ ​­ Courteau, E., Royle, P., Gascon, A., Marquis, A., Drury, J. E., & Steinhauer, K. (2013). Gender concord and semantic processing in French children, an auditory ERP study. In. S. Baiz, N. Goldman & R. Hawkes (Eds.), ­ BUCLD 37 proceedings. Cascadilla. Courteau, É. (2023). Lexico-semantic and morphosyntactic processing in French-speaking adolescents with and without developmental language disorder [PhD Thesis]. Université de Montréal. Crain, S., & Thornton, R. (2000). Investigations in universal grammar: A guide to experiments on the acquisition of syntax and semantics. MIT Press. Crain, S., & Thornton, R. (2006). Acquisition of syntax and semantics. In M. Traxler & M. Gernsbacher (Eds.), ­ Handbook of psycholinguistics. Elsevier. de Almeida, L., Ferré, S., Barthez, M.-A., & dos Santos, C. (2019). What do monolingual and bilingual chil­ 158–176. ­ ​­ dren with and without SLI produce when phonology is too complex? First Language, 39(2), Dromi, E., Leonard, L. B., & Shteiman, M. (1993). The grammatical morphology of Hebrew-speaking children with specific language impairment: Some competing hypotheses. Journal of Speech and Hearing Research, 36, 760–771. ­ ​­ Durant, K., Peña, E., Peña, A., Bedore, L. M., & Muñoz, M. R. (2019). Not all nonverbal tasks are equally nonverbal: Comparing two tasks in bilingual kindergartners with and without developmental language disorder. Journal of Speech, Language & Hearing Research, 62(9), ­ 3462–3469. ­ ​­ Durkin, K., & Conti-Ramsden, G. (2010). Young people with specific language impairment: A review of ­ 105–121. ­ ​­ social and emotional functioning in adolescence. Child Language Teaching and Therapy, 26(2), Elin Thordardottir (2008). Language-specific effects of task demands on the manifestation of specific language impairment: A comparison of English and Icelandic. Journal of Speech, Language, and Hearing Research, 51, 922–937. ­ ​­ Elin Thordardottir (2016). Grammatical morphology is not a sensitive marker of language impairment in Icelandic in children aged 4–14 years. Journal of Communication Disorders, 62, 82–100. ­ ​­ Elin Thordardottir, & Brandeker, M. (2013). The effect of bilingual exposure versus language impairment ­ 1–16. ­ ​­ on nonword repetition and sentence imitation scores. Journal of Communication Disorders, 46(1), Elin Thordardottir, Kehayia, E., Mazer, B., Lessard, N., Majnemer, A., Sutton, A., Trudeau, N., & Chilingaryan, G. (2011). Sensitivity and specificity of french language and processing measures for the identification of primary language impairment at age 5. Journal of Speech, Language, and Hearing Research, 54, 580–597. ­ ​­ Franck, J., Cronel-Ohayon, S., Chillier, L., Frauenfelder, U. H., Hamann, C., Rizzi, L., & Zesiger, P. (2004). Normal and pathological development of subject-verb agreement in speech production: A study on French children. Journal of Neurolinguistics, 17(2–3), ­­ ​­ 147–180. ­ ​­ Fujiki, M., & Brinton, B. (2014). Social communication assessment and intervention for children with language impairment. In D. A. Hwa-Froelich (Ed.), Social communication development and disorders. Psychology press. Georgiou, N., & Spanoudis, G. (2021). Developmental language disorder and autism: Commonalities and differences on language. Brain Sciences, 11(5), ­ 589. Goad, H., & White, L. (2006). Ultimate attainment in interlanguage grammars: A prosodic approach. Second Language Research, 22, 243–268. ­ ​­ Grosjean, F. (2021). Life as a bilingual: Knowing and using two or more languages. Cambridge University Press. Hickok, G., & Poeppel, D. (2007). The cortical organisation of speech processing. Nature Reviews Neuroscience, 8, 393–402. ­ ​­

403

Phaedra Royle et al. Ingram, T. T. S. (1959). Specific developmental disorders of speech in childhood. Brain, 82(3), ­ ­450–454. ​­ Kail, R. (1994). A method for studying the generalized slowing hypothesis in children with specific language impairment. Journal of Speech and Hearing Disorders, 37, ­418–421. ​­ Kulkarni, A. A., Chadd, K. E., Lambert, S. B., Earl, G., Longhurst, L. M., McKean, C., Hulme, C., McGregor, K. K., Cunniff, A., Pagnamenta, E., Joffe, V., Ebbels, S. E., Bangera, S., Wallinger, J., & Norbury, C. F. (2022). Editorial Perspective: Speaking up for developmental language disorder – the top 10 priorities for research. Journal of Child Psychology and Psychiatry, 63(8), ­ ­957–960. ​­ Kunnari, S., Savinainen-Makkonen, T., Leonard, L. B., Makinen, L., Tolonen, A.-K., Luotonen, M., & Leinonen, E. K. (2011). Children with specific language impairment in Finnish, the use of tense and ­ ­999–1027. ​­ agreement inflections. Journal of Child Language, 38(5), Gibson, T. A., Peña, E. D., & Bedore, L. M. (2014). The relation between language experience and receptive– expressive semantic gaps in bilingual children. International Journal of Bilingual Education and Bilingualism, 17, ­90–110. ​­ Haznedar, B., & Schwartz, B. D. (1997). Are there optional infinitives in child L2 acquisition? In E. Hughes, M. Hughes, & A. Greenhill (Eds.), BUCLD 21 proceedings. Cascadilla Press. Labelle, M., & Valois, D. (2003). Floated quantifiers, quantifiers at a distance, and logical form constructions in the acquisition of L1 French. In B. Barbara, A. Brown, & F. Conlin (Eds.), BUCLD 27 proceedings. Cascadilla Press. Leclercq, A.-L., Quémart, P., Magis, D., & Maillart, C. (2014). The sentence repetition task: A powerful diagnostic tool for French children with specific language impairment. Research in Developmental Disabilities, 35(12), ­ ­3423–3430. ​­ Leonard, L. (2013). SLI across languages. Child Development Perspectives, 8(1), ­ ­1–5. ​­ Leonard, L. (2014). Children with specific language impairment (2nd ed.). MIT Press. Leonard, L., Bortolini, U., Caselli, M. C., McGregor, K. K., & Sabbadini, L. (1992). Morphological deficits in children with specific language impairment: The status of features in the underlying grammar. Language Acquisition, 2(2), ­ ­151–179. ​­ Lonigan, C. J., & Milburn, T. F. (2017). Identifying the dimensionality of oral language skills of children with typical development in preschool through fifth grade. Journal of Speech, Language, and Hearing Research, 60(8), ­ ­2185–2198. ​­ Lum, J. A. G., Ullman, M. T., & Conti-Ramsden, G. (2015). Verbal declarative memory impairments in ​­ specific language impairment are related to working memory deficits. Brain and Language, 142, ­76–85. MacWhinney, B. (2000). The CHILDES project: Tools for analyzing talk (3rd ed.). Lawrence Erlbaum Associates. Manfredi, M., Cohn, N., Sanchez Mello, P., Fernandez, E., & Boggio, P. S. (2020). Visual and verbal narrative comprehension in children and adolescents with autism spectrum disorders: An ERP study. Journal of Autism and Developmental Disorders, 50(8), ­ ­2658–2672. ​­ McGregor, K. K., Goffman, L., Van Horne, A. O., Hogan, T. P., & Finestack, L. H. (2020). Developmental language disorder: Applications for advocacy, research, and clinical service. Perspectives of the ASHA Special Interest Groups, 5(1).38–46. ­ ­ ​­ McGregor, K. K., Oleson, J., Bahnsen, A., & Duff, D. (2013). Children with developmental language impairment have vocabulary deficits characterized by limited breadth and depth. International Journal of Language & Communication Disorders, 48(3), ­ ­307–319. ​­ Miller, C. A., & Gilbert, E. (2008). Comparison of performance on two nonverbal intelligence tests by adolescents with and without language impairment. Journal of Communication Disorders, 41, ­358–371. ​­ Paradis, J., Rice, M. L., Crago, M., & Marquis, J. (2008). The acquisition of tense in English: Distinguishing child second language from first language and specific language impairment. Applied Psycholinguistics, 29(4), ­ ­689–722. ​­ Pourquié, M. (2017). Erreurs produites par des enfants bilingues à développement typique et atypique du langage : qu’est-ce qui les distingue. In C. Bogliotti, F. Isel, & A. Lacheret-Dujour (Eds.), Atypies langagières de l’enfance à l’âge adulte. Apport de la psycholinguistique et des neurosciences cognitives. De Boeck. Pourquié, M., Lacroix, H., & Kartushina, N. (2019). Investigating vulnerabilities in grammatical processing of bilinguals: Insights from Basque-Spanish adults and children. Linguistic Approaches to Bilingualism, 9(4–5), ­­ ​­ ­600–627. ​­ Pratt, A. S., Peña, E. D., & M. B. L. (2021). Sentence repetition with bilinguals with and without DLD: Differential effects of memory, vocabulary, and exposure. Bilingualism: Language and Cognition, 24(2), ­ ­305–318. ​­

404

Experimental methods to study atypical language development Reilly, S., Tomblin, B., Law, J., McKean, C., Mensah, F. K., Morgan, A., Goldfeld, S., Nicholson, J. M., & Wake, M. (2014). Specific language impairment: A convenient label for whom? International Journal of Language & Communication Disorders, 49(4), ­ ­416–451. ​­ Rice, M. L., & Hoffman, L. (2015). Predicting vocabulary growth in children with and without specific language impairment: A longitudinal study from 2;6 to 21 years of age. Journal of Speech, Language, and Hearing Research, 58(2), ­ ­345–359. ​­ Rispens, J., & Been, P. (2007). Subject-verb agreement and phonological processing in developmental dyslexia and specific language impairment (SLI): A closer look. International Journal of Language & Com­ ­293–305. ​­ munication Disorders, 42(3), Rose, Y., & MacWhinney, B. (2014). The PhonBank Project: Data and software-assisted methods for the study of phonology and phonological development. In J. Durand, U. Gut & G. Kristofferse, (Eds.), The Oxford handbook of corpus phonology. Oxford University Press. Rose, Y., & Royle, P. (1999). Uninflected structure in familial language impairment: Evidence from French. ­­ ​­ ­70–90. ​­ Folia Phoniatrica et Logopaedica, 51(1–2), Roulet-Amiot, L., & Jakubowicz, C. (2006). Production and perception of gender agreement in French SLI. Advances in Speech Language Pathology, 8(4), ­ 335–346. ­ ​­ Royle, P., & Courteau, E. (2014). Language processing in children with specific language impairment: A review of event-related potential studies. In L. T. Klein & V. Amato (Eds.), Language processing: New research. Nova Science Publishers. Royle, P., & Elin Thordardottir (2008). Elicitation of the passe compose in French preschoolers with and ­ ­1–22. ​­ without specific language impairment. Applied Psycholinguistics, 29(3), Royle, P., & Reising, L. (2019). Elicited and spontaneous determiner phrase production in French speaking children with developmental language disorder. Canadian Journal of Speech Language Pathology and Audiology, 43(3), ­ 167–187. ­ ​­ Royle, P., & Stine, I. (2013). The French noun phrase in preschool children with SLI: Morphosyntactic and ­ ­945–970. ​­ error analyses. Journal of Child Language, 40(5), Rvachew, S., Royle, P., Gonnerman, L., Stanké, B., Marquis, A., & Herbay, A. (2017). Development of a tool to screen risk of literacy delays in French-speaking children: PHOPHLO. Canadian Journal of SpeechLanguage Pathology and Audiology, 41(3), ­ 321–340. ­ ​­ Segalowitz, S. J., Santesso, D. L., & Jetha, M. K. (2010). Electrophysiological changes during adolescence: A review. Brain and Cognition, 72(1), ­ ­86–100. ​­ Silleresi, S., Tuller, L., Delage, H., Durrlemann, S., Bonnet-Brilhault, F., Malvy, J., & Prévost, P. (2018). Sentence repetition and language impairment in French-speaking children with ASD. In A. Gavarró (Ed.), On the acquisition of the syntax of romance. John Benjamins. Steinhauer, K. (2014). Event-related potentials (ERPs) in second language research: A brief introduction to the technique, a selected review, and an invitation to reconsider critical periods in L2. Applied Linguistics, 4(35), ­ 393–417. ­ ​­ Steinhauer, K., & Drury, J. E. (2012). On the early left-anterior negativity (ELAN) in syntax studies. Brain ­ ­135–162. ​­ and Language, 120(2), Steel, G., Rose, M., Eadie, P., & Thornton, R. (2013). Assessment of complement clauses: A comparison between elicitation tasks and language sample data. International Journal of Speech-Language Pathology, 15(3), ­ 286–295. ­ ​­ Sukenik, N., & Friedmann, N. (2018). ASD is not DLI: Individuals with autism and individuals with syntactic DLI show similar performance level in syntactic tasks, but different error patterns. Frontiers in Psychology, Language Sciences, 9, 279. Taha, J., Stojanovik, V., & Pagnamenta, E. (2021). Sentence repetition as a clinical marker of developmental language disorder: Evidence from Arabic. Journal of Speech, Language & Hearing Research, 64, 4876–4899. ­ ​­ Tallal, P., Stark, R., Kallamn, C., & Mellits, D. (1981). Developmental dysphasia: Relation between acoustic processing deficits and verbal processing. Neuropsychologia, 18(3), ­ ­273–284. ​­ Tomblin, J. B., Records, N. L., Buckwalter, P., Zhang, X., Smith, E., & O’Brien, M. (1997). Prevalence of specific language impairment in kindergarten children. Journal of Speech Language & Hearing Research, 40(6), ­ ­1245–1260. ​­ Tomblin, J. B., & Zhang, X. (2006). The dimensionality of language ability in school-age children. Journal of Speech, Language, and Hearing Research, 49(6), ­ 1193–1208. ­ ​­

405

Phaedra Royle et al. Tuller, L., Ferré, S., Prévost, P., Barthez, M.-A., Malvy, J., & Bonnet-Brilhault, F. (2017). The effect of computational complexity on the acquisition of French by children with ASD. In L. R. Naigles (Ed.), Innovative investigations of language in autism spectrum disorder. American Psychological Association Walter de Gruyter GmbH. Ullman, M. T., & Gopnik, M. (1999). Inflectional morphology in a family with inherited specific language ­ ­51–117. ​­ impairment. Applied Psycholinguistics, 20(1), Ullman, M. T., & Pierpont, E. I. (2005). Specific language impairment is not specific to language: The proce­ ­399–433. ​­ dural deficit hypothesis. Cortex, 41(3), ­ ­ ­ ­ Volkers, N. (2018). Diverging views on language disorders. The ASHA Leader, 23(12) https://doi.org/10.1044/ leader.FTR1.23122018.44. Ziegler, J. C., Castel, C., Pech-Georgel, C., George, F., Alario, F.-X., & Perry, C. (2008). Developmental dyslexia and the dual route model of reading: Simulating individual differences and subtypes. Cognition, ­ 151–178. ­ ​­ 107(1),

406

26 EXPERIMENTAL METHODS TO STUDY DISORDERS OF LANGUAGE PRODUCTION IN ADULTS Andrea Marini

26.1

Introduction

Language is a complex skill characterized by the interplay between linguistic knowledge and different cognitive skills such as attention, memory, and executive functions. It is implemented in an extensive neural network (e.g., Indefrey, 2012) and realized through micro- and a macrolinguistic processes (Marini et al., 2005). Microlinguistic processes allow for the organization of phonemes into morphological sequences and words (i.e., lexical processing) and their further articulation in sentences (i.e., grammatical processing). Macrolinguistic processes allow for the generation/ identification of communicative intentions as well as for the contextualization of meanings transmitted through microlinguistic processes (i.e., pragmatic processing). Macrolinguistic processes also include discourse processing, the ability to organize the propositions conveyed by sentences through cohesive and coherent ties that contribute to the generation of a story’s mental model (i.e., a mental construction of a narrative; Johnson-Laird, 1983) or scenario (i.e., through a process of scene construction; Buckner & Carroll, 2007). Strokes, brain tumours, traumatic brain injuries, psychiatric and neurodegenerative diseases can affect language processing in many ways. This chapter aims at providing an overview of some of the methods used to assess and study disorders of language production in adults. After providing a historical sketch of the problem and introducing some of the most recent issues, the attention will be focused on the tasks used to assess language production in patients showing the usefulness of a multilevel procedure of discourse analysis. The chapter will end with considerations about the neural underpinnings of such skills and their potential applications in rehabilitation.

26.2

Historical perspectives

The interest in the description and interpretation of language disturbances is ancient (Whitaker, 1998). The earliest documents are some Babylonian cuneiform tablets dating back to the second millennium before Christ with descriptions of people with neurological and psychiatric disorders sometimes interpreted biologically, sometimes mystically (Reynolds & Wilson, 2014). Of note,

407

DOI: 10.4324/9781003392972-30

Andrea Marini

the Edwin Smith’s Surgical Papyrus dating back to 1700 BC reports on the neurological outcomes of 27 patients who had suffered head trauma (Breasted, 1930). Some had lost their memory, others the ability to walk. One of these patients (Case 22) was described as speechless: “silent in sadness, without speaking, like one suffering from ‘feebleness’ … because of something that has entered from outside” (p. 296). A picture that today we would call non-fluent aphasia with depressive symptomatology. Anecdotal descriptions of single patients uniquely based on the observation of their behaviour kept on being accumulated over the centuries. Filtered through the speculations of Greek and Roman physicians, naturalists, and philosophers, these observations were occasionally gathered and published in books (e.g., Schenck von Grafenberg, 1584; Wepfer, 1658). Nonetheless, the first models of language processing and implementation in the brain were formulated only in the 19th century (e.g., Hood, 1824; Lichteim, 1885; Wernicke, 1874). Notably, these seminal observations were mostly based on the qualitative observation of the patients’ speech output. For example, simply observing the way he spoke, Alexander Hood (1824) described the case of a 48-year-old man who, after a stroke, had experienced a marked loss of expressive language, limited to just a few words. In the discussion, Hood compared this case with that of three other patients, postulating, for expressive language, the existence of three mental components: one in charge of the control of the articulatory muscles essential to produce speech, one devoted to lexical planning, and one in charge of the organization of word-related memories. It was only in the 20th century that such qualitative observations began to be joined and eventually replaced by quantitative assessments. Leading to the formulation of new models of language implementation in the brain (e.g., Geschwind, 1970) later replaced by current associationist approaches (e.g., Catani et al., 2012), the shift towards quantitative assessments was facilitated by the joint development of disciplines such as general linguistics, neuropsychology, psycholinguistics, and neurolinguistics. From that moment on, the first tasks for studying language production skills in adult patients were ideated and standardized making it possible to better define the characteristics of language impairments and guide rehabilitation procedures.

26.3

Critical issues and topics

Language assessment is a delicate procedure. Two issues represent a major challenge for both researchers and clinicians and will be considered here. The former concerns the need to select tests that appropriately tap the desired linguistic skills. The latter regards the correct interpretation of the findings from such tests.

26.3.1

The development of batteries of tests assessing language production in patients with acquired brain disorders

To obtain reliable data on their residual or deficient language abilities, patients with language disorders are usually administered standardized batteries of tests. For a long time, the assessment in patients with linguistic and/or communicative impairments mainly focused on microlinguistic skills. Therefore, linguistic deficits were routinely examined at the single word or sentence level. This led to the development of tests designed to assess microlinguistic skills in patients with acquired brain lesions in the acute (e.g., Mississippi Aphasia Screening Test by Nakase-Thompson et al., 2005) or chronic (e.g., Aachen Aphasia Test, AAT by Huber et al., 1984; Western Aphasia Battery−Revised by Kertesz & Raven, based on Kertesz, 1982) stage of the disease. Even if in some batteries (e.g., AAT) it is also possible to perform a rough qualitative analysis of discourse production, the macrolinguistic dimension of processing had been neglected. In the late 1970s and 408

Methods to study disorders of language production

early 1980s, however, some investigations showed that extending the study of language disorders to the analysis of macrolinguistic processing (i.e., pragmatic, discourse, and conversational skills) would prove highly useful in clinical practice (e.g., Yorkston & Beukeman, 1980; Armstrong, 2000). In this direction, more recently batteries of tests assessing pragmatic and communicative skills have been developed. The Assessment of Pragmatic Abilities and Cognitive Substrates (Arcara & Bambini, 2016) and the Assessment Battery for Communication (Sacco et al., 2013) are two interesting examples of this kind.

26.3.2 An overview of the language production system Converging experimental evidence suggests that linguistic production is a complex process characterized by phases of discourse planning and organization, conceptual preparation, lexical selection and access, grammatical processing, and articulation. Importantly, every single stage of processing is characterized by the continuing interaction between linguistic abilities and cognitive skills. In the phase of discourse planning and organization, the speaker needs to generate a communicative intention and organize the contents that (s)he wants to communicate to the interlocutor. The corresponding mental model or scenario provides a foundation for developing the story structure (Gernsbacher, 1990). In this early stage, sustained attention allows for keeping the cognitive resources focused on discourse planning, selective attention for avoiding the distraction potentially triggered by non-pertinent stimuli, and divided attention for distributing the available cognitive resources on the different stages of processing. Phonological working memory keeps active the propositions that will be later produced. Long-term declarative semantic and episodic memories allow for the retrieval of information about the organization of the discourse to be produced (e.g., its story grammar with elements such as introduction, scenario, characters, time frame, episodes, etc.; Haberlandt et al., 1980) and the relative scripts (Schank & Abelson, 1977). Executive functions (Mozeiko et al., 2011) allow speakers to update incoming information with what has been previously communicated through phonological working memory, shift between strategies to select new episodes or new informative words, inhibit the production of irrelevant sentences (e.g., ­off-topic ​­ comments and derailments), monitor the ongoing communicative process, and plan the efficient organization of the message. Furthermore, the speaker also needs to pay attention to the interlocutors’ expectations by generating a Theory Of (their) Mind (TOM) while considering both linguistic (i.e., what has already been said) and extralinguistic context about the place and time in which the conversation takes place. After generating the story structure, the speaker needs to organize it in sequences that form its macrostructure. This will eventually be broken down into single propositions forming the microstructure of the discourse under construction (e.g., Kintsch & van Dijk, 1978). These propositions will be generated through a process of conceptualization and eventually produced through stages of lexical selection, access, and articulation (e.g., Indefrey & Levelt, 2000). The stage of conceptualization allows for the activation in long-term semantic memory of the target lexical concept (i.e., a concept “for which there is a lexical item in the mental lexicon” Levelt, 2001; p. 13464) that corresponds to the communicative intention generated by the speaker. Conceptual preparation requires the abilities to keep the cognitive resources focused on the selection of the target lexical item (sustained attention) distributing the available cognitive resources among the different processes that are taking place at the same time (divided attention) while ignoring distracting stimuli (selective attention). The activated lexical concept triggers a process of lexical selection where the semantic information contained in the lexical concept is spread to lemmas in a component of long-term semantic memory known as the mental lexicon. During lexical selection, phonological 409

Andrea Marini

working memory is required to keep active the selected lexical concept while executive functions monitor the selection process and inhibit the activation of lexical competitors that may otherwise lead to the production of semantic errors (for example, a semantic paraphasia such as cat instead of dog). All the information (i.e., semantic, morphosyntactic, morphological, phonological, and phonetic) associated with the selected lexical item is stored in the mental lexicon. To produce the target sentence, the speaker needs to gain access to such information. In the process of lexical access, the first available information regards its grammatical category (e.g., name, verb, etc.) and the morphosyntactic valences that are necessary for grammatical encoding (i.e., the generation of the sentence). The selected lexical item with such morphosyntactic information is labelled lemma. Once the lemma has been activated, its morphosyntactic information likely interacts with the merge function (i.e., a basic recursive combinatorial operation that allows for the computation of hierarchical structures such as phrases and sentences; Chomsky, 1995) to trigger sentence generation and placement of the selected lemma in the correct position in the sentence. During these former stages of lexical access, executive functions are needed to plan the sentence, monitor its generation, and inhibit the potential activation of wrong morphosyntactic information. Furthermore, attention is required to keep the cognitive resources focused on the construction of the target sentence (sustained attention), avoid distraction (selective attention), and distribute the available resources on the different phases that are under elaboration. During the stage of sentence generation, phonological working memory allows speakers to keep all lexical information active until needed while generating the sentence whereas the episodic buffer is involved in keeping track of what has already been introduced in previous sentences to establish efficient linguistic and conceptual ties among them. Long-term declarative and non-declarative memories are involved in such processes: semantic memory is the repository where the morphosyntactic information related to the selected lemma is stored; procedural memory is required to perform highly automatized and routinary processes like merge that are not under the direct control of the speaker. A phase of morphological encoding will provide the morphological information associated with the activated word. The activation is then spread to the word’s phonological code (stage of phonological encoding) followed by stages of syllabification and of phonetic encoding that will convert the retrieved phonemes in abstract articulatory representations (i.e., the articulatory score). Cognitive skills such as focused, divided, and selective attention, executive functions, working memory, and long-term declarative memory keep on playing important roles during these last phases of lexical access. The articulatory scores will eventually be produced through a stage of articulation where motor planning and control are needed to utter the selected words, working memory is necessary to keep all lexical information active until needed and long-term non-declarative procedural memory allows for performing the automatized movements of the articulators that are not under the conscious control of the speaker.

26.4

Current contributions and research

Over the past few years, increasing attention has been devoted to the issues of the validity, specificity, and reliability of the tests for language assessment. For example, Rohde et al. (2018) showed that none of the 56 tests included in their review reported diagnostic data in differentiating aphasic and non-aphasic stroke patients limiting their applicability. Similarly, in a scoping review on diagnostic criteria for Developmental Language Disorders, Sansavini et al. (2021) suggested the need to jointly use traditional standardized and innovative psycholinguistic measures such as the analysis of discourse production to increase diagnostic accuracy. As for language production, 410

Methods to study disorders of language production

standardized tests usually include naming tasks, semantic and phonological fluency tasks, sentence generation/completion tasks. Overall, such tests are quite useful in capturing very specific aspects of language production in patients. Nonetheless, they also have relevant limitations.

26.4.1

Naming tasks

Standardized naming tasks are commonly employed during the assessment of lexical production skills (e.g., Macoir et al., 2018). They typically require patients to produce a target word, usually a name or a verb (Ellis et al., 1992). The elicitation procedure may involve different modalities. However, the most common is visual confrontation naming which requires patients to name a series of pictures that are selected according to specific criteria (e.g., a word’s frequency or its semantic characteristics). The Boston Naming Test (BNT, Kaplan et al., 1983) is among the most used ones. Such tests allow clinicians to discriminate on a quantitative basis between individuals who can and those who cannot properly name the items. Furthermore, some study showed the presence of moderate (Killgore & Adams, 1999) to high (Thompson & Heaton, 1989) correlations between BNT scores and the vocabulary sub-test scores at the Wechsler Adult Intelligence Scale – Revised. Nonetheless, naming tasks cannot differentiate between the different phases of lexical production outlined in paragraph 26.3.2. In these tasks, after retrieving the concept represented in the picture, the patient needs to select the corresponding lexical concept which, in turn, will trigger the process of lexical selection. This way, such tasks basically assess lexical selection abilities and, potentially, also the access to the morphological form and syllabic and articulatory score of the selected word without the possibility to disentangle between these phases (see Harry and Crowe, 2014 for a review). Overall, standardized naming tasks allow for easy administration and scoring as well as high test-retest reliability. Nonetheless, they limit the assessment of word-finding difficulties at the single word level, which drastically limits their ecological validity and generalization to everyday communication contexts. This has been confirmed by Mayer and Murray (2003) who reported that, although correlating with aphasia severity, performance on a naming task did not predict the production of word-finding difficulties in conversation.

26.4.2

Fluency tasks

Since their conception by Thurstone (1938), fluency tasks have been widely used in neuropsychological assessment to characterize lexical and cognitive difficulties in a wide range of patients with brain injuries (e.g., Kavé & Sapir-Yogev, 2020). Fluency tasks are thought to rely on executive functions (Aita et al., 2019), word knowledge (Kavé & Yafé, 2014) as well as lexical access (Gordon et al., 2018) and selection skills (Pekkala, 2012). Two types of fluency tasks can be used: semantic and phonological. Semantic fluency tasks (also labelled category fluency tasks) require patients to produce in one minute as many words as possible that are in a specific semantic cluster (e.g., ANIMALS). Therefore, such tasks provide an indirect measure of the extension of a person’s mental lexicon and a direct way to assess lexical selection according to a semantic strategy. Phonological fluency tests require patients to produce in one minute as many words as possible beginning with a specific phoneme (e.g., /f/). Such tasks tap the ability to select words with a phonological strategy while inhibiting potential semantic competitors (i.e., words that are semantically related to the uttered ones but do not begin with the target phoneme). Therefore, phonological fluency tasks also assess metaphonological skills and inhibitory control. Interestingly, there is evidence suggesting that the demands of phonological fluency tasks vary across languages depending on a wide range of factors, including their writing and phonological systems. For example, in a study 411

Andrea Marini

comparing performance on phonological and semantic fluency tests of a cohort of 40 Japanese, 30 Turkish, and 31 English-speaking patients with schizophrenia from the United States, Sumiyoshi et al. (2014) reported the absence of group-related differences on the semantic fluency task used in the experiment (eliciting category: ANIMALS), whereas the American patients performed better than the other two groups on the phonological fluency task requiring them to produce words beginning with phonemes F, A, and S. Turkish and Japanese participants (asked to produce words beginning with A, E, and Z and with KA and TA, respectively) produced fewer words. As already discussed for naming tests, also fluency tasks do not allow clinicians to disentangle between the potential presence of difficulties in specific stages of lexical production (e.g., lemma retrieval, morphological and syllabic encoding, articulation, and self-monitoring; Weiss et al., 2006) and render results that are not ecological at all.

26.4.3

Sentence completion and generation tasks

The assessment of language production should include morphological and grammatical abilities. As to this regard, a systematic review on methods to evaluate sentence production deficits in aphasic individuals with agrammatism showed that such abilities are best assessed by administering structured tests that specifically tap grammatical production skills and tasks collecting samples of spontaneous language in story telling or free conversation (Mehri & Jalaie, 2014). Structured tests of grammatical production include word-ordering tasks, sentence completion or generation tasks, and sentence production priming tests (e.g., Cupit et al., 2016). For example, in sentence completion tasks, patients may be required to listen to a sentence and then to the beginning of a second one that must be completed using the correct morphology (e.g., 1st sentence “the father washes the dishes”; 2nd sentence: “the parents …”). Similarly, sentence generation tasks allow clinicians and researchers to assess the ability to generate grammatically well-formed sentences by providing them with a few words that must be organized in a sentence (e.g., Carlesimo et al., 1996). The Sentence Production Priming Test (Cho-Reyes & Thompson, 2012) assesses the production of sentences of varying complexity. Namely, for each item of the test, patients are provided with a pair of pictures depicting a reversible action. For example, one picture may represent a cat chasing a dog whereas the other one represents a dog chasing a cat. The experimenter produces a prime sentence to describe one picture asking the patient to utter a sentence with the same grammatical structure to describe the other picture. In such tasks, the items usually cover a wide range of syntactic structures (e.g., active, reversible, passive, subject- or object-relative clauses). This allows for the quantification of the ability to produce the target sentences but also for the qualitative inspection of which grammatical structures are still available. For example, Mack et al. (2021) showed that in a cohort of 77 persons with primary progressive aphasia, 34 individuals had lower accuracy scores for this task and were consequently classified as patients with impaired grammar whereas the remaining 43 participants had relatively preserved grammar production skills.

26.4.4

Limits of traditional standardized assessments of language production

Overall, the standardized tasks described so far are currently used to perform a general description of the linguistic profile of persons with linguistic deficits. Nonetheless, traditional tests assess the different aspects of linguistic processing separately failing to capture the complex interaction between cognitive and linguistic skills that characterizes the different stages of language production outlined in paragraph 26.3.2 Furthermore, they cannot unveil much about the patients’ informativeness (i.e., the ability to produce words that are informative in a specific context and 412

Methods to study disorders of language production

communicative interaction) and ability to organize the conceptual structure of a discourse. Of note, traditional tasks are typically administered in highly artificial settings. Asking a person to tell the name of what’s portrayed in a picture or to complete a sentence does not reflect how (s)he typically uses language in real-life contexts. This is the reason why Sansavini et al. (2021) highlighted the need to add innovative measures like conversational and discourse production analysis in language assessment.

26.4.5

Conversation analysis

Procedures of conversation analysis (CA) are useful in delineating the communicative potential of a patient (Perkins et al., 1999). It is usually applied to conversation samples about selected topics. In some cases, the interlocutors are left free to converse about free topics. The conversation is usually video-recorded and then transcribed. The transcripts and the video recordings are then analysed to gather information about the patients’ skills in selecting appropriate words and sentences, in establishing and maintaining adequate cohesive and coherent ties among the utterances, and to adequately use non-verbal aspects of communication. Therefore, this technique allows for the identification of the strategies used by patients to accomplish their communicative goals. CA allows clinicians to have an idea of the real communicative skills of the patients in a highly ecological setting. Unfortunately, such procedures are also time-consuming as transcriptions and analyses may take hours. For this reason, CA is much more common in research than in clinical practice. Furthermore, the diagnostic value of measures derived by CA has been questioned as the variable nature of conversations does not allow clinicians to compare the patients’ performance with reliable normative data (Ramsberger & Rande, 2002). For instance, although many persons with aphasia may succeed in producing non-verbal information during a conversation (Holland, 1982), the existing research on this topic is anecdotal (Bush et al., 1988) and does not provide reliable insights for the clinical assessment of these aspects of their communication (e.g., to which extent non-verbal behaviours allow these patients to react to the perceived failure in sending information).

26.4.6

Discourse analysis

As CA, also discourse analysis provides critical information about the patients’ communicative and linguistic abilities. However, it is far less time-consuming and much more reliable. Discourse samples with different characteristics can be elicited in many ways. Examples include (but are not limited to) story retelling tasks (Saffran et al., 1989), descriptions of procedures (Ulatowska et al., 1983), recounts of personal events (Glosser & Deser, 1990), single-picture descriptions and cartoon-picture story generation tasks (Nicholas & Brookshire, 1993). In a review, Bryant et al. (2016) showed that single-picture descriptions are the most used elicitation stimuli but tend to trigger samples of descriptive rather than narrative discourse. Furthermore, such samples are typically made of few words. According to Brookshire and Nicholas (1994), for a reliable analysis clinicians and researchers should gather speech samples made of at least 300 words. A possibility is to use a combination of different picture stimuli. These stimuli, however, need to be accurately paired with each other to elicit similar types of narrative discourse. Ideally, these should include single pictures eliciting a narrative rather than a mere description (e.g., the Cookie Theft picture from the Boston Diagnostic Aphasia Examination by Goodglass et al., 2001) and cartoon-picture stories (e.g., the Flowerpot and Quarrel stories by Huber & Gleber, 1982 and Nicholas & Brookshire, 1993, respectively). As for CA, these speech samples are usually recorded, transcribed, and 413

Andrea Marini

analysed; however, differently from CA, they are shorter and faster to transcribe and analyse. The results of such analyses are quite informative as they highlight the complex interactions between micro- and macrolinguistic processes and between them and the cognitive functions implied in the different stages of message production. Narrative analysis can focus on the structural characteristics of the elicited speech samples, on their functional features, or both. Structural analysis allows for the assessment of the patient’s productivity, lexical, grammatical, and narrative organization skills. Functional analysis focuses on the patient’s ability to identify, select, and produce appropriate pieces of information at conceptual and/or lexical levels. As to this regard, a widely diffused approach examines the informative content of the elicited discourse samples by means of structural (e.g., the Shewan Spontaneous Language Analysis by Shewan, 1988), functional (e.g., Content Units, C.Us.; Yorkston & Beukelman, 1980; Correct Information Units, CIUs; Nicholas & Brookshire, 1993), or mixed (Lexical Information Units, LIUs; Marini, et al., 2011a) measures. These measures assess the linguistic and communicative difficulties in terms of accuracy in encoding information while producing speech samples in a controlled condition. This approach allows clinicians to compare the performance of a patient to normative data to obtain discourse measure(s) with a reliable diagnostic value. Furthermore, changes on these measures following therapy are perceived by naïve listeners suggesting that they also have an ecological value (Jakobs, 2001). In Sherratt (2007), 32 healthy individuals produced speech samples that included narratives generated by picture sequences related to personal experiences and procedures. The analyses showed that many structural and functional measures interact with each other. For example, greater relevance was related to more appropriate discourse grammar, fewer non-specific elements, greater cohesion, and syntactic complexity. Overall, the findings from this study suggested that a multi-layered approach to discourse analysis may prove useful in the assessment of linguistic skills of persons without and, possibly, with communication disorders, providing an additional perspective on how the different elements of discourse interact. In line with these findings, Marini et al. (2011a) described a multilevel procedure of discourse analysis that allows clinicians and experimenters to perform both structural and functional analyses and to directly examine their potential interrelations. The speech samples elicited by administering single pictures and cartoon pictures are audio-recorded and transcribed verbatim, including phonological fillers, pauses, and false starts. Their duration is calculated in seconds. The analysis of the transcripts provides information about the patient’s productivity (in terms of words, speech rate, and mean length of utterance), lexical and grammatical processing (percentages of phonological, semantic, and morphologic errors, omissions of function and content words and grammatically well-formed sentences), narrative organization (percentages of errors of cohesion, and local and global coherence), and informativeness (both lexical and conceptual). This methodology is much more informative than traditional standardized linguistic tests as it helps clinicians to determine (1) the exact nature of the linguistic impairment, (2) the way specific microlinguistic difficulties might affect macrolinguistic processing and vice versa, and (3) the putative efficacy of innovative rehabilitation protocols. For example, Andreetta et al. (2012) showed that the lexical difficulties of persons with anomic aphasia affected both microlinguistic and macrolinguistic processing: the frequent omissions of content words triggered by their lexical selection impairments correlated with the reduced level of grammatical completeness that, in turn, showed a strong negative correlation with the production of cohesive errors. Furthermore, their narrative speech samples were also characterized by an increased production of global coherence errors (mostly repetitive and filler utterances) that were likely the result of their word-finding difficulties and,

414

Methods to study disorders of language production

most importantly, showed a strong negative correlation with their reduced percentages of lexical informativeness. Similar results were reported in another study focusing on a different cohort of fluent aphasic participants with a diagnosis of Wernicke’s aphasia (Andreetta & Marini, 2015). Quite interestingly, the other side of the story (i.e., the possibility that macrolinguistic difficulties affect microlinguistic processes) was demonstrated in other studies. For example, in Marini et al. (2008) the narratives produced by patients with schizophrenia were characterized by massive macrolinguistic difficulties that triggered problems in phases of lexical selection (enhanced production of semantic paraphasias) and access to the stage of morphological encoding (enhanced production of morphological errors). Interestingly, the macrolinguistic difficulties were in turn predicted by measures of executive functions (i.e., inhibitory control) and sustained attention. This supports the hypothesis that narrative discourse production relies on a range of cognitive skills and that micro- and macrolinguistic processes interact continuously during its production. This conclusion has also been confirmed by other investigations on non-aphasic patients with traumatic brain injuries (TBI). Significant correlations were found between measures of cognitive flexibility, perseverative behaviours, global coherence errors, and lexical informativeness even in persons with mild TBI (e.g., Galetto et al., 2013). Interestingly, in these investigations, the multilevel procedure of discourse analysis revealed significant shortcomings that were simply not detected by traditional aphasia tests like the AAT (e.g., Marini et al., 2011b). The multilevel procedure of discourse analysis was also useful in delineating the linguistic difficulties of non-aphasic stroke patients with lesions to the right hemisphere. For example, Marini (2012) showed that they produced narratives with adequate levels of microlinguistic processing but with an increased number of global coherence errors that lowered their levels of lexical informativeness. Interestingly, further analyses revealed that these deficits were most evident in persons with anterior lesions lending indirect support to the hypothesis of a major involvement of frontal areas in the organization of a narrative discourse. Overall, these findings highlight the need to include multilevel procedures of discourse analysis in routine clinical practice to have an adequate idea of the patient’s linguistic production abilities.

26.5

Main research methods: How to perform a study to assess language production deficits

To plan and appropriately execute a study to examine disorders of language production, it is mandatory to follow a series of steps. First, the researcher needs to formulate a valid research question. For this, (s)he needs to have a solid theoretical knowledge of the stages involved in the process of message production and of the contributions of the cognitive skills required in each phase. Based on such theoretical knowledge, after reading the available relevant literature the researcher will formulate the hypothesis that needs to be tested. Let’s imagine that (s)he wants to plan a study to examine the role of inhibition on lexical production difficulties observed in the narratives produced by non-aphasic persons with severe TBI. A search on the major databases (e.g., “Pubmed” at https://pubmed.ncbi.nlm.nih.gov) with target keywords will provide the list of studies on this specific topic. Reading these studies will provide all necessary information to formulate a clear hypothesis and the relative predictions. For example, since inhibitory control is involved in the process of lexical selection, (s)he might decide to test the following hypotheses: (1) scores on a task assessing inhibitory control significantly predict the production of semantic errors in healthy individuals and TBI patients (e.g., semantic paraphasias: wrong words that are semantically related to

415

Andrea Marini

the target words as in “cat” instead of “dog”); (2) inhibition is impaired in TBI patients; (3) persons with TBI produce more semantic paraphasias than healthy individuals. A second step will consist in the identification of the numerosity of the sample. To test the research hypotheses, it will be necessary to recruit two groups of participants: one formed by patients with TBI and one made of healthy control participants. To establish the numerosity, the researcher needs to run a power analysis (for example, by using the software G*Power freely available here: https://download.cnet.com/G-Power/3000-2054_4-10647044.html) ­ ­­ ​­ ­­ ­​­­ ​­ that will allow him/her to determine how many participants will have to be recruited for the experiment in each group. The two groups will have to be balanced on variables that might be otherwise potentially confounding. These variables include age, level of formal education, gender, and socio-economic status. At the same time, the researcher will need to select the appropriate tasks to assess basic language and cognitive skills in both groups. At this stage, the linguistic assessment can be performed with a traditional standardized battery of tests (e.g., the AAT). This will allow to exclude from the control group those individuals who have difficulties on cognitive and linguistic tasks. Furthermore, this will allow to include in the group of participants with TBI only those who are not aphasic. It is mandatory that the research protocol is approved by a research ethics committee before implementation. Once the ethics committee has approved the research project, the researcher can begin with the recruitment of the participants and the administration of the tasks. The scoring procedure will allow to have all the relevant data that will be later analysed statistically. For example, a regression analysis will be performed independently for each group to assess hypothesis 1. Hypotheses 2 and 3 will be assessed by performing two parametric analyses (t-tests, assuming ­ ​­ a homogeneous distribution of the two samples) with semantic paraphasias and scores on the task assessing inhibitory control as dependent variables and group (controls versus TBI) as the independent variable. Based on the obtained results, the researcher will eventually write a paper where (s)he presents the study and the results discussing them considering the available scientific literature on this topic.

26.6

Recommendations for practice

This chapter focused on some of the more recent advancements in the field of language production analysis in patients with linguistic impairments. Overall, the reviewed literature suggests that traditional tasks assessing specific linguistic abilities in isolation (e.g., naming or fluency tasks) may not be sufficient to analyse the actual linguistic skills of such patients. Traditional tasks may fail to detect subtle linguistic difficulties and do not allow clinicians or scholars to describe the relations between micro- and macrolinguistic processes while producing narrative speech. Increasing evidence suggests that multilevel procedures of discourse analysis are much more informative and can highlight the presence of subtle linguistic difficulties that may not be identified by traditional batteries of tests (e.g., Fromm et al., 2017; Linnik et al., 2016). Namely, they help clinicians determine (1) the exact nature of the linguistic impairment, (2) the way microlinguistic difficulties might affect macrolinguistic processing, and (3) the putative efficacy of innovative rehabilitation protocols. For example, in Larfeuil and Le Dorze (1997), the administration of traditional tests failed to show any improvement in a cohort of 17 persons with aphasia after 6 weeks of therapy. Nonetheless, the analysis of their narrative language elicited with a picture description task highlighted the presence of a significant improvement of their communicative effectiveness. Indeed,

416

Methods to study disorders of language production

their connected speech was characterized by more open-class words per time unit. Similar findings were also reported by Marini et al. (2007) on a small group of three persons with non-fluent aphasia. In this case, on the post-therapy assessment standardized aphasia tests showed minimal changes, even if the patients’ levels of informativeness improved significantly. Notably, as in Jakobs (2001), this improvement was also noted by naïve judges who were asked to rate the levels of informativeness of the produced descriptions. Studies focusing on the relation between brain functioning and discourse processing confirm the usefulness of such procedures of discourse analysis. For example, Spalletta et al. (2010) further examined the characteristics of narrative processing in the schizophrenic individuals recruited in Marini et al. (2008). Namely the authors found that the reduced levels of lexical informativeness observed in these patients significantly correlated with volume changes in the dorsal aspect of the left inferior frontal gyrus (lIFG). To further explore the potential role played by the lIFG in the ability to select informative words, Marini and Urgesi (2012) performed an experiment with an off-line repetitive Transcranic Magnetic Stimulation (rTMS) protocol targeting this area in the lIFG. Namely, healthy adult individuals were administered cognitive, linguistic, and narrative production tasks in three conditions: in the target condition they were asked to perform such tasks after inhibitory rTMS of the epicentre in the lIFG whose atrophy in schizophrenia patients had been related to the reduction of informative words in Spalletta et al. (2010); in a controlled condition, they performed similar tasks after inhibitory rTMS over the contralateral area, i.e., the right inferior frontal gyrus (rIFG); in a third condition, they were administered similar tasks without any stimulation. Notably, rTMS over the dorsal portion of the lIFG reduced the levels of lexical informativeness increasing the number of global coherence errors. More recently, Mazzon et al. (2019) showed that the multilevel procedure of discourse analysis allows to disentangle between patients with mild cognitive impairment (MCI) due to Alzheimer’s disease (AD) and those MCI patients whose symptoms are not related to AD. Notably, in this study, the reduced levels of lexical informativeness found in persons with MCI due to AD were related to hypoperfusion in the same epicentre in the lIFG.

26.7

Future directions

A final remark concerns the potential translation of the findings described in the previous paragraphs to the rehabilitation of patients with linguistic deficits. As we have seen, the results from studies employing neuroimaging and neuromodulatory techniques suggest that the dorsal aspect of the lIFG is an epicentre of a wider neural network subserving the selection of contextually appropriate semantic representations. A critical issue here regards the possibility to use this information to enhance the outcome of rehabilitation protocols. Some studies apparently suggest that this is the case. For example, in Marangolo et al. (2013), in a cohort of 12 persons with chronic aphasia, the combined effect of an intensive conversational therapy treatment on discourse skills and transcranial Direct Current Stimulation (tDCS) (20 minutes, 1mA) over the lIFG augmented the production of informative words. In conclusion, even if it is time-consuming and expertise-demanding, the multilevel procedure of discourse analysis provides useful insight into the way persons with linguistic impairments use language in their daily life interactions. The same can hardly be achieved with traditional tasks assessing language production in a highly decontextualized environment. Furthermore, current investigations focusing on the neural correlates of some of the measures derived by this analysis may provide new ways to enhance the outcome of rehabilitation treatments. This should be the target of future investigations.

417

Andrea Marini

Further reading Coelho, C., Cherney, L., R., Shadden, B. (Eds.) (2022). Discourse analysis in adults with and without communication disorders: A resource for clinicians and researchers (pp. Plural Publishing, Inc. ­­  ­15–31). ​­ Indefrey, P. (2012). The spatial and temporal signatures of word production components: A critical update. Frontiers in Psychology, 2, 255. Marini, A., Andreetta, S., del Tin, S., & Carlomagno, S. (2011a). A multi-level approach to the analysis of narrative language in Aphasia. Aphasiology, 25, ­1372–1392. ​­

Related topics Experimental studies in discourse; eliciting spontaneous linguistic productions; new directions in statistical analysis for experimental linguistics; experimental methods to study atypical language development

References Aita, S. L., Beach, J. D., Taylor, S. E., Borgogna, N. C., Harrell, M. N., & Hill, B. D. (2019). Executive, language, or both? An examination of the construct validity of verbal fluency measures. Applied Neuropsy​­ chology Adult, 26(5), ­441–451. Andreetta, S., Cantagallo, A., & Marini, A. (2012). Narrative discourse in anomic aphasia. Neuropsychologia, ­ 1787–1793. ­ ​­ 50(8), Andreetta, S., & Marini, A. (2015). The effect of lexical deficits on narrative disturbances in fluent aphasia. ­ ­705–763. ​­ Aphasiology, 29(6), Arcara, G, & Bambini, V. (2016). A test for the assessment of pragmatic abilities and cognitive substrates (APACS): Normative data and psychometric properties. Frontiers in Psychology, 7, 70. ​­ Armstrong, E. (2000). Aphasic discourse analysis: The story so far. Aphasiology, 14, ­875–892. Breasted, J. H. (1930). The Edwin Smith surgical papyrus. The University of Chicago Press. Brookshire, R. H., & Nicholas, L. E. (1994). Speech sample-size and test-retest stability of connected speech ­ ­399–407. ​­ measures for adults with aphasia. Journal of Speech and Hearing Research, 37(2), Bryant, L., Ferguson, A., & Spencer, E. (2016). Linguistic analysis of discourse in aphasia: A review of the ­ ­489–518. ​­ literature. Clinical Linguistics & Phonetics, 30(7), Buckner, R. L., & Carroll, D. C. (2007). Self-projection and the brain. Trends in Cognitive Sciences, 11(2), ­49–57. ​­ Bush, C. R., Brookshire, R. H., & Nicholas, L. E. (1988). Referential communication by aphasic and non­ ​­ aphasic adults. Journal of Speech and Hearing Disorders, 53, ­475–482. Carlesimo, G. A., Caltagirone, C., & Gainotti, G. (1996). The Mental Deterioration Battery: normative data, di­ 378–384. ­ ​­ agnostic reliability and qualitative analyses of cognitive impairment. European Neurology, 36(6), Catani, M., Dell’Acqua, F., Bizzi, A., Forkel, S. J., Williams, S., Simmons, A., Murphy, D. G., & Thiebaut de Schotten, M. T. (2012). Beyond cortical localization in clinico-anatomical correlation. Cortex, 48, ­1262–1287. ​­ Chomsky, N. (1995). The minimalist program. MIT Press. Cho-Reyes, S., & Thompson, C. K. (2012). Verb and sentence production and comprehension in aphasia: ­ ­ ​­ Northwestern assessment of verbs and sentences. Aphasiology 26(10), 1250–1277. Cupit, J., Graham, N. L., Leonard, C., Tang-Wai, D., Black, S. E., & Rochon, E. (2016). Wh-questions and passive sentences in non-fluent variant PPA and semantic variant PPA: longitudinal findings of an anagram ​­ production task. Cognitive Neuropsychology, 33, ­329–342. Ellis, A., Kay, J., & Franklin, S. (1992). Anomia: Differentiating between semantic and phonological deficits. In D. I. Margolin (Ed.), Cognitive neuropsychology in clinical practice (pp. Oxford University ­­  ­207–228). ​­ Press. Fromm, D., Forbes, M., Holland, A., Dalton, S. G., Richardson, J., & MacWhinney, B. (2017). Discourse characteristics in aphasia beyond the Western Aphasia Battery cutoff. American Journal of SpeechLanguage Pathology, 26(3), ­ ­762–768. ​­ Galetto, V., Andreetta, S., Zettin, M., & Marini, A. (2013). Patterns of impairment of narrative language in mild Traumatic Brain Injury. Journal of Neurolinguistics. 26(6), ­ ­649–661. ​­ Gernsbacher, M. A. (1990). Language comprehension as structure building. Erlbaum.

418

Methods to study disorders of language production Geschwind, N. (1970). The organization of language and the brain. Science, 170, ­940–944. ​­ Glosser, G., & Deser, T. (1990). Patterns of discourse production among neurological patients with fluent language disorders. Brain and Language, 40(1), ­ ­67–88. ​­ Goodglass, H., Kaplan, E., & Barresi, B. (2001). Boston diagnostic aphasia examination (3rd ed.). Lippencott, Williams & Wilkins. Gordon, J. K., Young, M., & Garcia, C. (2018). Why do older adults have difficulty with semantic fluency? Aging, Neuropsychology, and Cognition, 25(6), ­ ­803–828. ​­ Haberlandt, K., Berian, C., & Sandson, J. (1980). The episodic schema in story processing. Journal of Verbal ​­ Learning and Verbal Behavior, 19, ­635–650. Harry, A., & Crowe, S. F. (2014). Is the Boston Naming Test still fit for purpose? The Clinical Neuropsychologist, 28(3), ­ ­486–504. ​­ Holland, A. (1982). Observing functional communication of aphasic adults. Journal of Speech and Hearing ​­ Disorders, 47, ­50–56. Hood, A. (1824). Case 4th - July 28, 1824 (Mr Hood’s case of injuries of the brain). Phrenological Journal ​­ and Miscellany, 2, ­82–94. Huber, W., & Gleber, J. (1982). Linguistic and non-linguistic processing of narratives in aphasia. Brain and ​­ Language, 16, ­1–18. Huber, W., Poeck, K., & Willmes, K. (1984). The Aachen aphasia test. Advanced Neurology, 42, 291−303. Indefrey, P. (2012). The spatial and temporal signatures of word production components: A critical update. Frontiers in Psychology, 2, 255 Indefrey, P., & Levelt, W. J. M. (2000). The neural correlates of language production. In M. S. Gazzaniga (Ed.), ­ The new cognitive neurosciences (pp. MIT Press. ­­  ­845–865). ​­ Jakobs, B. J. (2001). Social validity of changes in informativeness and efficiency of aphasic discourse following Linguistic Specific Treatment (LST). Brain and Language, 78, 115–127. ­ ​­ Johnson-Laird, P. N. (1983). Mental models. Cambridge University Press. ­ ​­ ­ Kaplan, E., Goodglass, H., & Weintraub, S. (1983). The Boston naming test. Lea & Fibiger. Kavé, G., & Sapir-Yogev, S. (2020). Associations between memory and verbal fluency tasks. Journal of Communication Disorders, 83, 105968. Kavé, G., & Yafé, R. (2014). Performance of younger and older adults on tests of word knowledge and word retrieval: Independence or interdependence of skills? American Journal of Speech-Language Pathology, 23, 36–45. ­ ​­ Kertesz, A. (1982). ­ Western aphasia battery. Grune & Stratton. Killgore, W., & Adams, R. (1999). Prediction of Boston Naming Test performance from vocabulary scores: Preliminary guidelines for interpretation. Perceptual and Motor Skills, 89, 327– ­ 337. ​­ Kintsch, W., & van Dijk, T. (1978). Toward a model of text comprehension and production. Psychological Review, 85, 363–394. ­ ​­ Larfeuil, C., & Le Dorze, G. (1997). An analysis of the word-finding difficulties and of the content of the discourse of recent and chronic aphasic speakers. Aphasiology 11(8), ­ 783–811. ­ ​­ Levelt, W. J. M. (2001). Spoken word production: A theory of lexical access. Proceedings of the National Academy of Sciences, 98(23), ­ 13464–13471. ­ ​­ Lichteim, L. (1885). On aphasia. Brain, 7, 433–84. ­ ​­ Linnik, A., Bastiaanse, R., & Höhle, B. (2016). Discourse production in aphasia: A current review of theoreri­ 765–800. ­ ​­ cal and methodological challenges. Aphasiology, 30(7), Mack, J. E., Barbieri, E., Weintraub, S., Mesulam, M. M., & Thompson, C. K. (2021). Quantifying grammatical impairments in primary progressive aphasia: Structured language tests and narrative language production. Neuropsychologia, 151, 107713. Macoir, J., Beaudoin, C., Bluteau, J., Potvin, O., & Wilson, M. A. (2018). TDQ-60–a color picture naming test for adults and elderly people: validation and normalization data. Aging, Neuropsychology, and Cognition, 25(5), ­ 753–766. ­ ​­ Marangolo, P., Fiori, V., Calpagnano, M. A., Campana, S., Razzano, C., Caltagirone, C., & Marini, A. (2013). tDCS over the left inferior frontal cortex improves speech production in aphasia. Frontiers in Human Neuroscience, 7(539), ­ 1–10. ­ ​­ Marini, A. (2012). Characteristics of narrative discourse processing after damage to the right hemisphere. Seminars in Speech and Language, 33(1), ­ 68–78. ­ ​­ Marini, A., Andreetta, S., del Tin, S., & Carlomagno, S. (2011a). A multi-level approach to the analysis of narrative language in aphasia. Aphasiology, 25, 1372–1392. ­ ​­

419

Andrea Marini Marini, A., Carlomagno, S., Caltagirone, C., & Nocentini, U. (2005). The role played by the right hemisphere ­ ­46–54. ​­ in the organization of complex textual structures. Brain and Language. 93(1), Marini, A., Caltagirone, C., Pasqualetti, P., & Carlomagno, S. (2007). Patterns of language improvement in ­ ­164–186. ​­ adults with non-chronic non-fluent aphasia after specific therapies. Aphasiology, 21(2), Marini, A., Galetto, V., Zampieri, E., Vorano, L., Zettin, M., & Carlomagno, S. (2011b). Narrative language ​­ in traumatic brain injury. Neuropsychologia, 49, ­2904–2910. Marini, A., Spoletini, I., Rubino, I. A., Ciuffa, M., Bria, P., Martinotti, G., Banfi, G., Boccascino, R., Strom, P., Siracusano, A., Caltagirone, C., & Spalletta, G. (2008). The language of schizophrenia: An analysis of micro and macrolinguistic abilities and their neuropsychological correlates. Schizophrenia Research, ​­ 105, ­144–155. Marini, A., & Urgesi, C. (2012). Please, get to the point! A cortical correlate of linguistic informativeness. ­ ­2211–2222. ​­ Journal of Cognitive Neuroscience, 24(11), Mayer, J., & Murray, L. (2003). Functional measures of naming in aphasia: Word retrieval in confrontation ­ ­481–497. ​­ naming versus connected speech. Aphasiology, 17(5), Mazzon, G., Ajčević, M., Cattaruzza, T., Menichelli, A., Guerriero, M., Capitanio, S., Pesavento, V., Dore, F., Sorbi, S., Manganotti, P., & Marini, A. (2019). Connected speech deficit as an early hallmark of CSFdefined Alzheimer’s disease and correlation with cerebral hypoperfusion pattern. Current Alzheimer ​­ Research, 16, ­1–12. Mehri, A., & Jalaie, S. (2014). A Systematic Review on methods of evaluate sentence production deficits in agrammatic aphasia patients: Validity and reliability issues. Journal of Research in Medical Sciences, 19, ­885–98. ​­ Mozeiko, J., Le, K., Coelho, C., Krueger, F., & Grafman, J. (2011). The relationship of story grammar and executive function following TBI. Aphasiology, 25, 826e835. Nakase-Thompson, R., Manning, E., Sherer, M., Yablon, S. A., Gontkovsky, S. L. T., & Vickery, C. (2005). Brief assessment of severe language impairments: Initial validation of the Mississippi aphasia screening ­ ­685–691. ​­ test. Brain Injury, 19(9), Nicholas, L., & Brookshire, R. (1993). A system for quantifying the informativeness and efficiency of the ­ ­338–350. ​­ connected speech of adults with aphasia. Journal of Speech Hearing Research, 36(2), Pekkala, S. (2012). Verbal fluency tasks and the neuropsychology of language. In M. Faust (Ed.), The handbook of the neuropsychology of language (pp. Blackwell Publishing Ltd. ­­  ­619–634). ​­ Perkins, L., Crisp, J., & Walshaw, D. (1999). Exploring conversation analysis as an assessment tool for aphasia: the issue of reliability. Aphasiology, 13(4–5), ­­ ​­ ­259–282. ​­ Ramsberger, G., & Rande, B. (2002). Measuring transactional success in the conversation of people with ­ ­337–353. ​­ aphasia. Aphasiology, 16 (3), ­ ​­ Reynolds, E. H., & Wilson, J. V. K. (2014). Neurology and psychiatry in Babylon. Brain, 137, 2611–19. Rohde, A., Worrall, L., Godecke, E., O’Halloran, R., Farrell, A., & Massey, M. (2018). Diagnosis of aphasia ­ e0194143. in stroke populations: A systematic review of language tests. PloS One, 13(3), Sacco, K., Angeleri, R., Colle, L., Gabbatore, I., Bara, B. G., & Bosco, F. M. (2013). ABaCo: assessment bat​­ tery for communication. Bollettino di Psicologia applicata, 268, ­55–58. Saffran, E. M., Berndt, R. S., & Schwartz, M. F. (1989). The quantitative analysis of agrammatic production: ​­ Procedure and data. Brain and Language, 37, ­440–479. Sansavini, A., Favilla, M. E., Guasti, M. T., Marini, A., Millepiedi, S., Di Martino, M. V., … & Lorusso, M. L. (2021). Developmental language disorder: Early predictors, age for the diagnosis, and diagnostic tools. ­ 654. A scoping review. Brain Sciences, 11(5), Schank, R. C., & Abelson, R. P. (1977). Scripts, plans, goals and understanding: An inquiry into human knowledge structures. Lawrence Erlbaum Associates. Schenck von Grafenberg, J. (1584). Observationes medicae de capite humano: Hoc est, exempla Capitis morborum, causarum, signorum, eventuum, curationum, ut singularia, sic abdita & monstrosa. Basileae, Ex Officina Frobeniana. ­­ ​­ 375–393. ­ ​­ Sherratt, S. (2007). Multi-level discourse analysis: a feasible approach. Aphasiology, 21(3–4), Shewan, C. M. (1988). The Shewan Spontaneous Language Analysis (SSLA) system for aphasic adults: de​­ scription, reliability and validity. Journal of Communication Disorders, 21, ­103–138. Spalletta, G. F., Spoletini, I., Cherubini, A., Rubino, I. A., Siracusano, A., Piras; F., Caltagirone, C, & Marini, A. (2010). Cortico-subcortical underpinnings of narrative processing impairment in schizophrenia. Psy​­ chiatry Research: Neuroimaging. 182, ­77–80.

420

Methods to study disorders of language production Sumiyoshi, C., Ertugrul, A., Yağcıoğlu, E. A., Roy, A., Jayathilake, K., Milby, A., Meltzer, H. Y., & Sumiyoshi, T. (2014). Language-dependent performance on the letter fluency tasks in patients with schizophrenia. Schizophrenia Research, 152, ­421–29. ​­ Thompson, L., & Heaton, R. (1989). Comparison of different versions of the Boston Naming Test. The Clini​­ cal Neuropsychologist, 3, ­184–192. Thurstone, L. L. (1938). Primary mental abilities. University of Chicago Press. Ulatowska, H. K., Freedman-Stern, R., Doyel, A. W., Macaluso-Haynes, S., & North, A. J. (1983). Production of narrative discourse in aphasia. Brain and Language, 19(2), ­ ­317–334. ​­ Weiss, E. M., Ragland, J. D., Brensinger, C. M., Bilker, W. B., Deisenhammer, E. A., & Delazer, M. (2006). Sex differences in clustering and switching in verbal fluency. Journal of the International Neuropsychological Society, 12, ­502–509. ​­ Wepfer, J. J. (1658). Observationes anatomicae ex cadaveribus eorum quos sustulit apoplexia. Cum exercitatione de eius loco affecto. Typis Joh. Caspari Suteri. Wernicke, C. (1874). Der aphasische Symptomencomplex. Eine psychologische Studie auf anatomischer Basis. Cohn und Weigert. Whitaker, H. A. (1998). History of neurolinguistics. In B. Stemmer & H. A. Whitaker (Eds.), Handbook of ­­  ­27–57). ​­ neurolinguistics (pp. Academic Press. Yorkston, K. M., & Beukeman, D. R. (1980). An analysis of connected speech samples of aphasic and normal speakers. Journal of Speech and Hearing Disorders, 45, ­27–36. ​­

421

27 EXPERIMENTAL METHODS FOR STUDYING SECOND LANGUAGE LEARNERS Alan Juffs and Shaohua Fang

27.1

Introduction

Second language acquisition (SLA) draws on several fields of inquiry including linguistics, psychology, education, and anthropology. This chapter focuses on the experimental, quantitative approach which is one branch of research in second language development.1 Methods in SLA increasingly include techniques from psychology, which allow researchers to study language performance in real time and provide greater statistical sophistication (Juffs, 2001; Roberts, 2019). In addition, the availability of on-line corpora (e.g., Davies, 2008; Juffs et al., 2020; Naismith et al., 2022) permits researchers to include aspects of frequency that they previously were unable to do.

27.2

Historical perspectives

Larsen-Freeman and Long (1991, pp. 5–6) note that SLA, in its current form, began in the 1960s. They pointed out the important shift in perspective from teaching methods to the learner’s language. This change opened the way for SLA research to consider second language development from the point of view of linguistic theory and cognitive development of the learner separately from issues of language teaching methods. Although instruction remains an important part of SLA, this chapter does not address issues of experimental classroom-based research. SLA research has adopted many theoretical constructs from Chomsky’s (1986) theory of Universal Grammar (UG), specifically the abstract principles that learners could not have acquired from the input. Moreover, studies of child language acquisition have shown that children do not make certain errors during development even though they may get some things ‘wrong’ (Yang et al., 2017). For adult second language learners (L2 speakers), several questions arise from a theory of UG: 1 Are second language learners’ grammars constrained by UG or somehow quite different cognitively from knowledge of a first language (e.g., White, 2003)? 2 What is the role of the first language in L2 development (e.g., Schwartz & Sprouse, 1996)? 3 What is the role of input and instruction? (e.g., Lightbown & Spada, 2013; Truscott & Sharwood Smith, 2011). DOI: 10.4324/9781003392972-31 422

Experimental methods for studying second language learners

4 Are different modules of the grammar acquired less well or faster than others and why? (e.g., Montrul, 2000). The methods discussed in this chapter are designed to address many of these questions.

27.2.1

Production data

Early SLA work frequently relied on data collected from learners’ production in classrooms and from untutored learners such as participants who contributed data to the ZISA project (Klein, 1975). Learners’ essays or audio recordings were collected, transcribed, and then analysed for patterns in development, which was a methodological approach termed ‘error analysis’ (Robinett & Schachter, 1983). The goal was to identify stages in the learners’ output that might reveal L1 influence, learning of the L2 target structures, or independently developing systems.

27.2.2 Acceptability judgement tasks Although production data are important for identifying stages in development, researchers also ask learners to judge the acceptability of grammatical and ungrammatical sentences. Acceptability Judgements Tasks (AJTs) allow researchers to know whether a grammatical form is part of the grammar of the language under scrutiny even if the learner never produces it. From a UG perspective, it is very important to determine whether L2 speakers’ acquisition is constrained by abstract principles that rule out certain structures. A classic example comes from question formation in English. Wh-questions (who, what, how, where, etc.) are considered to involve an association between the wh-word and a position (called a gap ‘__’) where a noun would be in the declarative form, as shown in (1). 1 2

a The dog chased the cars in the street. b What did the dog chase ___ in the street? a The drivers knew that the dog chased their cars in the street. b What did the drivers know__ that the dog chased __ in the street?

The example in (2) shows that the wh-word (the filler) can be associated with a gap some distance from it, including ‘covert’ gaps such as the one after ‘know’ in (2b). However, this filler-gap relationship is not always possible, as shown by (3). (‘*’ indicates an ungrammatical sentence). 3

a The drivers wondered why the dog chased their cars. b *What did the drivers wonder why the dog chased ___?

Sentences such as those in (2b) are quite rare but possible. The question is how learners might know (2b) is allowed but (3b) is not. Lacking relevant input, Chomsky (1977) suggested that whmovement is determined by abstract UG constraints. Thus, this is a context for SLA researchers to investigate knowledge of UG because languages such as Chinese and Japanese do not have whmovement and so it would not be possible to easily obtain knowledge of such a constraint from 423

Alan Juffs and Shaohua Fang

their L1 or their L2 input. In early work, Schachter (1989) argued that learners’ AJT judgements of wh-movement showed no access to UG, but developments in methods since then appear to support a role for UG in knowledge of wh-movement (Juffs, 2005).

27.3

Main research methods2

27.3.1 Acceptability judgement tasks Judgements of sentence well-formedness of sentences such as those in (1)–(3) are important sources of data for experimental investigations in psycholinguistics (Schütze, 1996). This chapter does not discuss in detail the differences between acceptability and grammaticality (Schütze & Sprouse, 2013) and uses acceptability as a cover term for both constructs. A variety of AJTs exist, such as Likert scale tasks, forced-choice tasks, yes-no choice tasks, ranking pairs of sentences, and magnitude estimate. For reasons of space limitation, we only concentrate on untimed Likert scale (LS) tasks and binary yes-no (YN) tasks. In an LS task, participants are given a scale with both endpoints labelled (e.g., one end for “completely unacceptable” and the other for “completely acceptable”). The most common scale has an odd number of points, such as 1–5 or 1–7, because an odd number of points allows participants to choose exactly the middle point, corresponding to “neutral”. Alternatively, if one assumes that a scale has a potentially ‘countless’ number of points along it, an LS with only both endpoints labelled is recommended. This decision, which should be made at the planning stage, is not a trivial matter because it is consequential for data preparation and choice of statistical model. Tasks should have three types of trials: target trials of the structure(s) that the researcher is interested in, ‘filler ’ trials to make sure the participants do not notice what the researcher is focused on, and trials as anchor items. Anchor trials should be included prior to the randomized target and filler items in one list, but after a practice phase. Anchor items familiarize participants with the task and identify those who failed to understand the task Schütze and Sprouse (2013) suggest anchor items consisting of six sentences, with two being completely unacceptable, two being moderately acceptable, and another two being completely acceptable. Individual participants should be removed for further analysis in cases where the difference between the average rating of the two completely ungrammatical and completely grammatical anchor items is smaller than 0, because they must have failed in understanding the judgement task (Fang & Juffs, 2020). In a binary Y/N task, participants are asked to indicate whether a given sentence is acceptable (e.g., by clicking a “Yes” button) or not (e.g., by clicking a “No” button). Compared to the LS, the Y/N task is particularly appropriate when a structure clearly falls into one category or the other in terms of acceptability. A well-known study using such a task is White and Juffs (1998). Their study included grammatical and ungrammatical sentences such as those in (2) and (3). Although both judgement tasks yield information about learners’ linguistic competence, each task has its own comparative advantages and disadvantages. First, the Y/N task is relatively more straightforward because no decisions regarding what kind of scale are needed. Second, Y/N tasks suit comparisons of categorical differences between conditions. For example, in (3b), all the researcher needs to know is whether the sentence is acceptable or not. Y/N tasks can be used when the researcher believes that the sentences would be impossible to correct if participants were asked to do so (e.g., (3b)). Third, the LS task provides more fine-grained information regarding the acceptability of a sentence with reference to others, especially when the two sentences are from the same category (e.g., both being overall acceptable). LS tasks are also often used with research in argument structure where judgements are less dichotomous. For example, the server poured the 424

Experimental methods for studying second language learners

water into the glass versus *the server poured the glass with water. Fourth, the LS task may be statistically more powerful than the Y/N task (Sprouse & Almeida, 2011). For both types of AJT, it is important to have enough tokens of each type of structure under consideration; six tokens of each type are usually used. This practice ensures that the lexical items are not the (only) cause of acceptance/rejection of a structure. However, judgement data only provide information about the end product of sentence processing, because participants in untimed LS and Y/N tasks make decisions after reading the sentence in its entirety. Moreover, judging the acceptability of sentences has been argued to draw on explicit knowledge about the target structure (Vafaee et al., 2017) and indeed AJTs have been used to elicit explicit knowledge (Loewen, 2009). Learners may be particularly affected by such knowledge compared to L1 speakers due to classroom instruction. To minimize the involvement of explicit knowledge, timed AJTs and self-paced reading (SPR) tasks among others have been widely used (Juffs & Rodríguez, 2014).

27.3.2

Truth value judgements tasks

Truth value judgement tasks allow researchers to obtain data without directly asking participants to rate sentences’ acceptability. This method has its origin in child language acquisition research (e.g., Crain & Thornton, 2000; Musolino & Lidz, 2006) because children usually lack metalinguistic skills for AJTs. These tasks are particularly important when the issue is the interpretation of potentially ambiguous sentences where providing learners with a context is vital. In a TVJT for L2 learners, participants typically are provided with a story context in the written format (sometimes accompanied by a picture, such as in Grüter et al., 2010), and then with a test sentence for them to decide whether it is a True or False description of the preceding story context. The task is not necessarily in the format of a binary choice, and sometimes adapted into a version for gradient ratings (e.g., Marsden, 2009). It should be noted that the learner’s grammar, rather than the pragmatic context, constitutes the basis for the judgement to be made in TVJTs. The story context facilitates interpretation by providing contextual information to participants, thus better tapping into learners’ linguistic competence. A classic example concerns the interpretation of nouns, pronouns, and reflexives, illustrated in (4). In (4a), ‘themselves’ must be the same people as ‘the drivers’, whereas in (4b), ‘them’ cannot co-refer to ‘the drivers’. In 4(c), the picture could be either of ‘the drivers’ or ‘the mechanics’, whereas in (4d) ‘themselves can only refer to ‘the drivers’. 4 Pronouns and reflexives a The famous racing drivers hurt themselves. b The famous racing drivers hurt them. c The drivers sent the mechanics pictures of themselves at the track. d The mechanics told the drivers to prepare themselves for the competition. When asked to interpret sentences such as (4c), speakers often prefer the subject ‘the drivers’ as being in picture and fail to notice the ambiguity that ‘the mechanics’ could be in the picture. However, given a context, the ‘mechanics’ interpretation can be forced. Interestingly, languages such as Chinese and Japanese appear to allow co-reference with ‘the mechanics’ outside the clause in (4d) and disallow co-reference with an object in (4c). White et al. (1997) investigated this issue with a TVJT. An example item from their study in which co-reference with an object is forced by context is in (5). It was designed so that the researchers would know whether Japanese-speaking learners 425

Alan Juffs and Shaohua Fang

would allow co-reference of a reflexive with an object, which is not permitted in Japanese. If not, they would choose FALSE because they could only interpret ‘herself ’ as referring to ‘nurse’: 5 Susan wanted a job in a hospital. A nurse interviewed Susan for the job. The nurse asked her about her experience, her education and whether she got on well with people. The nurse asked Susan about herself. TRUE/FALSE. Slabakova et al. (2017) also focused on interpretation of pronouns and reflexives. Other studies have investigated the interpretation of the relative scope of negation and disjunction (e.g., Grüter et al., 2010). An example of this issue is in Özçelik (2018), which is a bidirectional study that investigated the interpretation of quantifier scope and which lies at the interface of syntax-semanticspragmatics. In English, sentences such as (6) are ambiguous between the surface scope reading as in (6a) and the inverse scope reading as in (6b), adapted from Lidz and Musolino (2002). 6 Donald didn’t find two guys. a It is not the case that Donald found two guys. b There are two guys that Donald didn’t find. However, the counterpart of (6) in Turkish permits only (6a). Differential success would be predicted due to cross-linguistic differences, albeit in different directions depending on the theoretical assumptions about L1 influence. In the TVJT, both groups of learners read a story in which Donald plays hide-and-seek with four of his friends but finds only two of the four guys. They then read (6) to decide either YES or NO if it truthfully described the story. In this context, (6a) is false on its surface scope reading (he did in fact find two out of four) and true (6b) on its inverse scope reading. The results showed that Turkish learners of English performed better than English learners of Turkish for the task, suggesting the latter had persistent difficulties overcoming the learnability problem whereas the latter had no issue with the external interface achieving the target grammar.

27.3.3

Tasks that use computer technology

Paper and pencil LS and Y/N tasks often allow an unlimited time for participants to complete the task. In contrast, computers enable researchers to collect response time (RT) data, which are times that participants take to make a computer key press in reading a word or sentence or to move their eyes across a clause or a picture. Jiang (2012, p. 17) notes that ‘The use of RT data is based on the premise that cognitive processes take time and by observing how long it takes individuals to respond to different stimuli in different conditions, we can ask questions about how the mind works’. RTs are very fast and measured in milliseconds (ms). As a result, researchers can make fine-grained inferences both about the grammar and how it is used during processing. We now address how researchers collect RT data in addition to judgement or comprehension data.

27.3.3.1

SPR tasks

Since Just et al. (1982), SPR has been a mainstream psycholinguistic technique and continues to be used even if some more advanced techniques (e.g., eye-tracking), and brain imaging, both event-related potentials (ERP), and functional magnetic resonance imaging (fMRI), have

426

Experimental methods for studying second language learners The student who

­Figure  27.1

Self-paced reading screen showing blanks and appearance of words during and after key presses.

become available. SPR has also been used to examine the processing of many linguistic structures/ phenomena, such as those containing temporary ambiguity in garden paths (GP) (e.g., The child put the candy on the table in her mouth; Pritchett, 1992), agreement violations (e.g., *the key to the cabinets were on the table; Jiang, 2004; Roberts & Liszka, 2013), non-canonical word orders (e.g., Mitsugi & Macwhinney, 2010), reflexives and pronouns (e.g., Wu et al., 2020), syntactically complex but essentially unambiguous sentences (e.g., Lee et al., 2020), and sentences manipulated for linguistic predictability (e.g., Fang & Albasiri, 2021). SPR is a frequent technique in processing research in SLA, partly because it is “lower tech” compared to, for example, eye-tracking or ERPs. More importantly, using SPR allows researchers to address a wide range of theoretical issues in SLA, such as how L2 learners represent and process linguistic input incrementally, to what extent L2 processing (qualitatively/quantitatively) differs from L1 processing, as well as the role of L1 in L2 parsing. In an SPR task, sentences are typically presented one word or phrase at a time in a noncumulative fashion, which is also known as the moving window technique (Just et al., 1982), as shown in Figure 27.1. The experimental presentation usually starts with a fixation cross for about 1,000 ms, signalling the position where the first word or phrase will appear, followed by a series of dashes. The participant presses a key to reveal the first word or phrase of the sentence and continues, during which each press would result in the disappearance of the previous word or phrase which is replaced with dashes, until the end of the sentence. The task is self-paced in the sense that the participant controls the rate at which they press the button. The reading times (RTs) are recorded for the interval between each key press. Participants are then immediately checked for their offline comprehension of the sentence processed as an additional measure and to ensure that the task was taken seriously. The two most frequent checks are Yes/No questions about sentence meaning and AJTs. However, given the mixed results regarding the effects of the offline task on RTs in SPR, it is important for researchers to consider which comprehension task to adopt. For example, Leeser et al. (2011) found that learners showed sensitivity to violations of noun-adjective gender agreement only when participants were asked to judge grammaticality but not when answering meaning-based comprehension questions. Such task effects were not observed across investigations on the processing of complex wh-questions in L2 German – grammaticality judgements for Jackson and Dussias (2009) and Y/N questions for Jackson and Bobb (2009). As such, offline task effects seem to depend on the linguistic phenomena under investigation. Among the first to apply the SPR method in the study of SLA was a series of studies summarized in Juffs and Rodríguez (2014). For example, Juffs (1998) used the word-by-word SPR task to investigate how learners of English from various L1 backgrounds processed the temporarily ambiguous sentences containing reduced relative clauses, such as The bad boy criticized almost every day were playing in the park. Participants in this experiment sentences slowed down when encountering the noun phrase (NP) (i.e., day) in the reduced relative clause, especially relative to

427

Alan Juffs and Shaohua Fang

when experiencing a reduced relative clause containing an optional transitive verb, such as The bad boy watched almost every day were playing in the park, suggesting that they may have utilized the argument structure information encoded in the transitive verb criticized which requires a direct object adjacent to the verb. The SPR technique is not without limitations, however. First, the way sentences are presented lacks ecological validity because under normal circumstances readers do not process sentences word-by-word without being able to see the entire text and tend to skip highly predictable words (Ehrlich & Rayner, 1981). One way to mitigate this lack of naturalness might be to present sentences phrase by phrase or combine words as a single analytical unit if sentences are presented word-by-word in the experimental setup. Jegerski (2014) pointed out that ­word-by-word presen­ ­​­­ ​­ ­ ­​­­ ​­ tation versus phrase by phrase constitutes a trade-off between the level of detail in RTs and the degree of ecological validity of the experimental task. In addition, readers often go back to what they have read for various reasons and the SPR paradigm, especially in the non-cumulative version, does not allow re-reading. Relatedly, SPR does not capture late processing effects that would presumably be observed in regression (re-reading) during sentence processing. In this sense, SPR is better suited for tapping early processing than for late processing, such as integration of multiple kinds of information for recovery from misanalysis due to garden path sentences (e.g., Şafak & Hopp, 2021).

27.3.4 ­Eye-tracking ​­ Eye-tracking offers a more ecologically valid method than SPR because participants can re-read text, hence teasing apart early and late processing. Moreover, SPR is not considered an optimal method for capturing predictive processing, during which the parser proactively builds up linguistic representations by generating predictions based on available information (Kaan, 2014). Pickering and Gambi (2018) argued that the clearest demonstration of prediction occurs when a study could reveal pre-activations of aspects of the linguistic structure. If this is the case, then the visual world paradigm appears to be the best method, because it can be used to record eye movements towards a visual object corresponding to the linguistic referent even before it is encountered in the actual language input, hence demonstrating prediction.

27.3.4.1 ­Eye-tracking ​­ during reading For eye-tracking during reading, participants’ eye movements are followed while reading a text (e.g., sentence or passage). There are two major types of eye movement behaviours: fixations and saccades. Fixations occur when the eyes remain stationary for a certain period. Saccades result from rapid eye movements between two eye fixations. Regressions arise when saccades move backwards through text (i.e., from right to left when reading English). Patterns of eye movements are associated with the ease or difficulty with which texts are comprehended (Rayner, 1998). For example, longer fixation durations and/or a greater number of fixations index lexical processing difficulty during sentence comprehension. Therefore, different kinds of eye movements provide more fine-grained information regarding language processing compared to data obtained from SPR (Frenck-Mestre, 2005). Moreover, different eye movement measures (see Godfroid, 2019, pp. 214–215 for details) make it possible to distinguish between early and late processing. For example, second pass time, summed duration of all fixations made on an

428

Experimental methods for studying second language learners

area of interest after the eyes initially exit for that area, is deemed as a late processing measure for global comprehension among others (Godfroid, 2019) and can be taken to reflect reanalysis (Frenck-Mestre, ­­ ​­ 2005). In general, eye-tracking methods are used to study the cognitive processes underlying L2 sentence comprehension in real-time (e.g., Dussias, 2003; Felser et al., 2009), especially in cases where RTs measures as those from SPR would not allow us to gain information about particular reading behaviours, such as regression. For example, Şafak and Hopp (2021) examined eye movements of German and Turkish learners of English on temporarily ambiguous sentences that had been manipulated for two factors: verb bias and plausibility, as in (7). A sentence is DO-biased when its main verb is likely to take a direct object (DO), as in ‘hear ’ in (8a) and (8b); a sentence is SC-biased when its main verb is likely to take a sentential complement (SC) as in ‘conclude’. The sentences were also manipulated for plausibility in terms of whether areas following the disambiguating region were semantically matched with the direct object interpretation in earlier regions. 7 a ­DO-BIAS; ​­ SEMANTIC MATCH He heard the speech was quite possibly short and to the point. b ­DO-BIAS; ​­ SEMANTIC MISMATCH He heard the speech was quite possibly postponed until next week. c ­SC-BIAS; ​­ SEMANTIC MATCH He concluded the speech was quite possibly short and to the point. d ­SC-BIAS; ​­ SEMANTIC MISMATCH He concluded the speech was quite possibly postponed until next week. Two regions of interest (ROI): the disambiguating region (i.e., was quite) and the final region where semantic persistence effects were expected (i.e., short and to the point and postponed until next week). Greater reanalysis difficulty was reflected in the eye-tracking data with sentences containing DO-biased verbs (unexpectedly followed by a clause rather than just a noun phrase) by longer second pass times for the L2 group and more regressions for the L1 group at the disambiguating region, suggesting the L2 learners were able to use verb bias information in the same way as native speakers. More crucially, the L1 group showed an interaction between verb bias and plausibility at the disambiguating region in total reading times, a late processing measure associated with reanalysis. By contrast, the L2 group lacked such an interaction in any region, suggesting learners were unable to use verb bias to overcome semantic persistence effects. Eye-tracking during reading is, therefore, useful to investigate structural reanalysis at different stages of processing. This method has also been applied to the L2 study of other phenomena, such as grammatical violations (Keating, 2009), referential processing in Binding Theory (Felser et al., 2009), and lexical processing of orthography (Martin & Juffs, 2021).

27.3.4.2 

Visual world ­eye-tracking ​­

Eye-tracking in the visual world paradigm is well-suited for studying spoken language comprehension (Tanenhaus et al., 1995). In such tasks, while looking at a visual display, participants’ eye movements are recorded while listening to sentences. Because eye movements are time locked

429

Alan Juffs and Shaohua Fang

to the auditory stimuli, this paradigm is particularly useful to address topics such as predictive processing. In a landmark study, Altmann and Kamide (1999) reported that English listeners used the sectional restriction information encoded in the verb to anticipate what will come in the input. Crucially, when given the beginning ‘the boy will eat …’ or ‘the boy will move …’ listeners looked at a picture of the boy near a cake much more quickly when the verb ‘eat’ was heard than when the verb ‘move’ was heard. In SLA, an important question is whether L2 learners differ from L1 speakers in their abilities to generate predictions during sentence comprehension (Grüter et al., 2017; Kaan, 2014). One fruitful area for testing predictive processing among learners is the use of morphosyntactic information, such as gender marking on determiners (Grüter et al., 2012) and case marking on nouns (Mitsugi & Macwhinney, 2016). For example, building upon Kamide et al. (2003), Mitsugi and Macwhinney (2016) investigated whether English learners of Japanese, like Japanese native speakers, could use information encoded in case markers to predict upcoming referents. They tested three constructions as in (8). 8

a Canonical ditransitive ­gakkou-de ​­ majimena ­gakusei-ga ​­ kibishii ­sensei-ni ​­ shizukani ­tesuto-o ​­ watashita. serious ­student-NOM strict ­teacher-DAT quietly ­exam-ACC ​­ handed over ­school-LOC ​­ ​­ ​­ “At the school, the serious student quietly handed over the exam to the strict teacher.” b Scrambled ditransitive ­gakkou-de ​­ kibishii ­sensei-ni ​­ majimena ­gakusei-ga ​­ shizukani ­tesuto-o ​­ watashita. strict ­teacher-DAT serious ­student-NOM quietly ­exam-ACC handed over ­school-LOC ​­ ​­ ​­ ​­ “At the school, the serious student quietly handed over the exam to the strict teacher.” c Accusative ­gakkou-de ​­ majimena ­gakusei-ga ​­ kibishii ­sensei-o ​­ shizukani karakatta. ­school-LOC ​­ serious ­student-NOM ​­ strict ­teacher-ACC ​­ quietly teased “At the school, the serious student quietly teased the strict teacher.” The participants should be able to predict an upcoming theme noun ‘test’ at the adverb ‘quietly’ if they access and use information from case markers because dative ‘ni’ and nominative ‘ga’ indicate that an accusative -o case-marked noun is yet to come. As in Figure 27.2, a visual display of a student, a house, an exam, and a teacher were presented to the participants when they were listening to the sentences in Japanese. The eye movement data showed that L1 speakers looked predictively more at the theme in both (9a) canonical and (9b) scrambled conditions than in the accusative condition (9c). However, the L2 group showed no difference between conditions, indicating no initiation of predictive processing even though they exhibited good knowledge of Japanese case markers measured in an offline task.

27.3.5

Masked priming

Many of the methods discussed so far address learners’ knowledge of clause structure and abstract principles in semantics and syntax. However, an enduring challenge for second language learners is knowledge of morphology, be it case and gender marking on determiners and nouns, tense and aspect marking on verbs, or derivational morphology. Although work on these topics exists based on production data (e.g., Spinner & Juffs, 2008; Prévost & White, 2000), experimental techniques 430

Experimental methods for studying second language learners

­Figure  27.2

Display used in Mitsugi and Macwhinney (2016). Mitsugi, S., & Macwhinney, B. (2016). The use of case marking for predictive processing in second language Japanese. Bilingualism, Language, and Cognition, 19(1), 19– 35. Reproduced with permission.

­Table  27.1 Conditions in a masked priming experiment Prime type

­Prime  – ​­60 milliseconds

Target decision

Identity Inflected -ed Unrelated

wrap wrapped greet

WRAP WRAP WRAP

that allow researchers to investigate morphology are increasingly used. Some of these are offline tasks such as affix-choice, word and nonword tasks, morphological relatedness, and suffixordering tasks (Wu & Juffs, 2022). The most widely used computer-based technique is masked priming (Forster & Davis, 1984; Forster, 1998). An example of a typical experiment involves a lexical decision about whether a word is a word or not (e.g., ‘bird’ versus ‘*burd’). However, prior to recording the time for the decision, a very brief ‘flash’ of another word is presented, usually about 60 milliseconds, which the participant does not consciously perceive. The ‘flash’ of the prime word is assumed to allow lexical access on the target word to occur faster. An example is provided in Table 27.1. If the participant ‘strips’ the -ed during the flashed prime, showing morphological sensitivity, the RT on the target, WRAP, should be the same as with the identity condition. Thus, Identity = Inflected -ed and < Unrelated indicates ‘full priming’. Partial priming is where Identity is faster than Inflected, which is faster than Unrelated. 431

Alan Juffs and Shaohua Fang

Example studies that have used this technique include Silva and Clahsen (2008) and Neubauer and Clahsen (2009) who argued that second language morphological processing is quite different from L1 speakers and based instead on whole words only, whereas Diependaele et al. (2011) suggested that L1 and L2 processes are similar in that words can be morphologically decomposed, at least for derivation.

27.3.6 Additional recommendations for practice 27.3.6.1

Latin square designs

An experimental issue related to controlling frequency of items in a study would be how many times a learner sees the same lexical item in an experiment. For example, a frequent issue in second language learning has been the study of psychological verbs, which describe mental states as in (9) and (10). 9 The spider frightened the doctor. (= Object Experiencer, OE) 10 The doctor fears spiders (= Subject Experiencer, SE). As shown in (9) and (10), sometimes the experiencer is an object (9) and sometimes a subject in (10). Other object experiencer verbs are ‘please’, disappoint’, and ‘bore’, and subject experiencer verbs are ‘like’, ‘enjoy, and ‘appreciate’. Research has shown that OE sentences are more challenging to process than SE verbs for L2 learners (e.g., White et al., 1999). However, when used in the passive, it seems that processing is faster. To avoid the problem of participants in an experiment realizing what is being tested and to isolate the structural property (OE vsersu SE) from the individual verb, researchers make sure that individual verbs are seen only once in an experiment. To ensure that participants see multiple sentences from the same condition but are not cued by the fact that sentences created across conditions share the lexical items except for the part under experimental manipulation, different versions of experimental items are constructed and distributed according to a Latin Square such that the participant only sees one version of each experimental item. For example, when artificially creating an experiment in the case of psych verbs, we could construct different sets (e.g., four) of experimental items each with sentences containing either OE or SE verbs, as in (12). ­12

Set 1: a1. The spider frightened the doctor. (OE) b1 The spider feared the doctor. (SE) Set 2: a2. The cat amused the mouse. (OE) b2 The cat adored the mouse. (SE) Set 3: a3. The woman surprised the man. (OE) b3 The woman trusted the man. (SE) Set4: a4. The soldier angered the captain. (OE) b4 The soldier tolerated the captain. (SE) Two versions of items are then generated as below. Version 1: a1, b2, a3, b4 Version 2: b1, a2, b3, a4 432

Experimental methods for studying second language learners

Suppose there are 32 participants taking part in the experiment. Accordingly, version 1 will be assigned to 16 participants and version 2 will be assigned to another 16 participants.

27.3.6.2

Lexical items: Frequency, orthographic, and phonological neighbours

One of the critical issues in experimental design is to allow the researcher to focus on the structure or principle under investigation without being affected by specific vocabulary choices in the experiment. Therefore, researchers try to choose lexical items that are of similar frequency and complexity. An example is a recent study by Martin and Juffs (2021) on second language reading. They were interested in second language learners’ speed in reading words with vowels which have consistent phoneme-grapheme mappings and vowels which do not. For example, the orthographic vowel sequence ‘ee’ in English is almost always /i/, as in ‘greet’, ‘sleep’, ‘week’; in contrast, the sequence ‘ea’ can be /i/ as in ‘eat’, /ej/ as in ‘great’. It was important to control the frequency of these words in order not to confound the experimental variable (ambiguous versus unambiguous vowels) with another factor such as familiarity based on frequency. Thus, consistent and inconsistent words were matched on a range of factors, including imageability (‘hope’ is not imageable, but ‘cat’ is), length in letters, phonemes, and syllables. The number and frequency of orthographic (words spelled similarly) and phonological (words that sound similar) neighbours and the frequency of two letter (bigram) sequences and their frequency were controlled. Resources used for this checking were E-Lexicon (Balota et al., 2007) and the MRC Psycholinguistic Database (Wilson, 1988). ­

27.3.7

Statistical analysis

Statistical analysis obviously assumes standard theoretical views in basic statistics which we cannot review here. Because each experimental task has its own specifics for how the data from that task should be analysed, this section cannot provide a comprehensive overview of the analytical procedures for every task. Rather, we highlight procedures for statistical analysis that would be applicable across tasks and propose recommendations wherever possible. Before fitting for relevant statistical models, the raw data need to be cleaned and transformed wherever appropriate. For example, data from AJTs should be trimmed for individual participants based on their performance on the anchor items. For tasks such as SPR and eye-tracking where comprehension questions (CQs) are included, responses to CQs at a given accuracy rate (e.g., lower than 80%) are often used to remove participants. RTs are then removed based on certain cut-off points (e.g., [200 ms, 2,000 ms]) and selected SD (standard deviation) boundaries (e.g., [−2.5, +2.5]) calculated by participant and/or language group. For TVJTs, data from participants who failed to reach the threshold for accurately judging the filler or control trials could be removed from further analysis. In addition, data are often transformed mathematically prior to statistical modelling. For gradient judgement data, raw ratings are z-score transformed by participant and language group to account for individual participants and learners of different language groups (e.g., L1 versus L2) using Likert scales differently (Spinner & Gass, 2019). Most statistical techniques assume a normal distribution, but RT data are often positively skewed and so they are log transformed (e.g., log-e, log-10) to adjust for the skewedness of the distribution. Which transformation can be determined through the Box-Cox procedure (Box & Cox, 1964). Researchers also may want to transform RTs to length-adjusted residual RTs to control for the fact that regions under comparison between conditions may differ in length and individual participants may differ in reading speed. Residual RTs can be calculated by fitting a linear mixed-effect model for RTs on all data (critical items and 433

Alan Juffs and Shaohua Fang

fillers), with word length (i.e., number of characters) as the fixed effects and by-participant intercepts and by-participants random slopes for word length. Once data are cleaned and transformed, they are then submitted to statistical modelling. The field has seen a shift from the use of traditional ANOVAs and regression to the use of mixed-effects models (MEMs), which offer a range of benefits. First, MEMs are robust against the violation of the independence assumption that must be met for many statistical tests such as linear regressions. It is common for L2 data points to be not independent in that each participant usually provided multiple responses in each experiment. Second, MEMs allow us to model for the variation in participants and stimuli in one single analysis, which would not be possible for other tests such as ANOVAs. In L2 studies, L2 participants are sampled from a wider population, and so too are experimental stimuli sampled from a much larger set of linguistic materials, see the language-as-fixed-effect fallacy (Clark, 1973). With MEMs, we can include both participants and items as crossed random effects at the same time. Third, MEMs work for a range of data distributions, e.g., linear MEMs with a continuous dependent variable and logit MEMs with a binomial-dependent variable (Winter, 2020). The lme4 package in R (Bates et al., 2015) is used for modelling. Specifically, the lmer function is called for fitting linear MEMs on data with a continuous dependent variable, and the glmer function for fitting logit MEMs on data with a binary outcome. Modelling procedures should be specified for the following: (1) contrasts for fixed effects (e.g., treatment, sum) (Schad et al., 2020); (2) fixed effects terms in the model formula; (3) random effects’ structure in the formula are included because the focus of a study is the structure and its processing and researchers need to exclude ‘noise’ in the experiment from individual participants and items in the tests. Thus, included are maximal random effects following Barr et al. (2013) with by-participant and by-item intercepts, and by-participant random slopes for within-subject factors (e.g., linguistic structures) and their interactions, and byitem random slopes for between-subject factors (e.g., language group) and their interactions; (4) How p-values are estimated (e.g., the lmerTest package (Kuznetsova et al., 2017)), (5) how to handle model convergence issues. With these concerns in mind, researchers should justify their choices of data cleaning, transformation, and statistical modelling based on theoretical assumptions of the research design and/or on commonly adopted practices in the field. In addition, making analysis procedures as transparent as possible by sharing experimental materials and coding scripts (Marsden et al., 2016) for the study to be able to be reproduced and replicated as one may wish.

27.4

Critical issues and topics

The critical issues in current research have progressed from simple questions about the availability of UG in adult SLA and whether second language learners use grammatical principles to process sentences. The field is currently concerned with how different modules of the grammar interact, how this interaction might reveal knowledge of representation, and how such representation is used in real-time performance (Juffs, 2023). Current questions concern some of the following: • How do second language learners integrate different modules of knowledge (grammatical principles, lexicon, [cultural] context) to comprehend sentences and which other factors might affect processing? (Hopp, 2022) • How can studies of sentence processing be made more ecologically valid, e.g., by providing context (Cunnings, 2017; Juffs, 2017). • What do learners’ errors and behaviour in experiments in morphology reveal about the nature of their grammatical presentation? Does failure to mark morphology mean that L2 knowledge

434

Experimental methods for studying second language learners

is fundamentally different from representations acquired by children? How are inflectional and derivational features ‘re-assembled’ in the L2? (Lardiere, 2009). • How can principles of open science be encouraged for transparency and data sharing? (Marsden, et al., 2016).

Notes 1 Students interested in anthropological approaches might consider Packer (2018)s book on qualitative research methods. 2 Unfortunately, space does not permit discussion of experiments with phonetics and phonology. However, see Styler’s on-line guide for the use of PRAAT (https://wstyler.ucsd.edu/praat/) in phonetics. Recent example L2 studies include Olson (2021) and Stoehr et al. (2017).

Further reading Levshina, N. (2015). How to do linguistics with R. Benjamins. Mackey, A., & Gass, S. M. (Eds.) (2012). Research methods in second language acquisition: A practical guide. Wiley. Pakiti, A., De Costa, P., Plonsky, L., & Starfield, S. (Eds.). (2018). The Palgrave handbook of applied linguistics research methodology. Palgrave MacMillian (Springer Nature).

Related topics New directions in statistical methods for experimental linguistics; assessing adult linguistic competence; experimental methods to study bilinguals; controlling social factors in experimental linguistics

References Altmann, G. T., & Kamide, Y. (1999). Incremental interpretation at verbs: Restricting the domain of subse­ ­247–264. ​­ ­ ­ ­­ ​­ ­ ­­ ​­ quent reference. Cognition, 73(3), https://doi.org/10.1016/S0010-0277(99)00059-1 Balota, D. A., Yap, M. J., Hutchison, K. A., Cortese, M. J., Kessler, B., Loftis, B., Neely, J. H., Nelson, D. L., ­ Simpson, G. B., & Treiman, R. (2007). The English lexicon project. Behavior Research Methods, 39(3), ­445–459. ​­ Barr, D. J., Levy, R., Scheepers, C., & Tily, H. J. (2013). Random effects structure for confirmatory hypothesis ­ ­255–278. ​­ ­ ­ ­ testing: Keep it maximal. Journal of Memory and Language, 68(3), https://doi.org/10.1016/j. jml.2012.11.001 Bates, D., Mächler, M., Bolker, B. M., & Walker, S. C. (2015). Fitting linear mixed-effects models using ­ ­1–48. ​­ lme4. Journal of Statistical Software, 67(1), Box, G. E., & Cox, D. R. (1964). An analysis of transformations. Journal of the Royal Statistical Society: ­ ­211–243. ​­ Series B (Methodological), 26(2), Chomsky, N. (1977). On wh- movement. In P. Culicover, T. Wasow & A. Akmajian (Eds.), Formal syntax ­­  ­71–132). ​­ (pp. Academic Press. Chomsky, N. (1986). Knowledge of language. Praeger. Clark, H. H. (1973). The language-as-fixed-effect fallacy: A critique of language statistics in psycho­ ­335–359. ​­ ­ logical research. Journal of Verbal Learning and Verbal Behavior, 12(4), https://doi. ­ ­­ ​­ ­ ­­ ​­ org/10.1016/S0022-5371(73)80014-3 Crain, S., & Thornton, R. (2000). Investigations in universal grammar: A guide to experiments on the acquisition of syntax and semantics. MIT Press. Cunnings, I. (2017). Parsing and working memory in bilingual sentence processing. Bilingualism, Language ­ ­659–678. ​­ ­ ­ ­ and Cognition, 20(4), https://doi.org/10.1017/S1366728916000675 ­­ ​­ The corpus of contemporary American English (COCA). Available online at https:// Davies, M. (2008–). ­ ​­ ­ www.english-corpora.org/coca/.

435

Alan Juffs and Shaohua Fang Diependaele, K., Duñabeitia, J. A., Morris, J., & Keulers, E. (2011). Fast morphological effects in first and ­ ­344–358. ​­ ­ ­ second language word recognition. Journal of Memory and Language, 64(4), https://doi.org/ ­ 10.1016/j.jml.2011.01.003 Dussias, P. E. (2003). Syntactic ambiguity resolution in L2 learners: Some effects of bilinguality on L1 ­ ­529–557. ​­ ­ and L2 processing strategies. Studies in Second Language Acquisition, 25(4), https://doi. ­ ­ org/10.1017/S0272263103000238 Ehrlich, S. F., & Rayner, K. (1981). Contextual effects on word perception and eye movements during read­ ­641–655. ​­ ­ ­ ­­ ing. Journal of Verbal Learning and Verbal Behavior, 20(6), https://doi.org/10.1016/S0022​­5371(81)90220-6 ­ ­­ ​­ Fang, S., & Albasiri, E. (2021). Effects of syntactic and semantic predictability on sentence comprehension: A comparison between native and non-native speakers. Proceedings of the Annual Meeting of the Cognitive ­ ­1549–1554. ​­ Science Society, 43(43), Fang, S., & Juffs, A. (2020) Offline processing of Ba- and Bei-constructions in Mandarin Chinese. In Pro­­  ­422–437). ​­ ceedings of the 32nd North American conference on Chinese Linguistics (NACCL-32) (pp. University of Connecticut, Storrs, CT. Felser, C., Sato, M., & Bertenshaw, N. (2009). The on-line application of binding PrincipleAin English as a second lan­ 485–502. ­ ​­ ­ ­ ­ guage. Bilingualism: Language and Cognition, 12(4), https://doi.org/10.1017/S1366728909990228 ­ Forster, K. I. (1998). The pros and cons of masked priming. Journal of Psycholinguistic Research, 27(2), ­203–233. ​­ ­ ­ ­ https://doi.org/10.1023/A:1023202116609 Forster, K. I., & Davis, C. (1984). Repetition priming and frequency attenuation in lexical access. Jour­ ­ ​­ ­ nal of Experimental Psychology: Learning, Memory, and Cognition, 10(4), 680–698. https://doi. ­ ­­ ​­ org/10.1037/0278-7393.10.4.680 Frenck-Mestre, C. (2005). Eye-movement recording as a tool for studying syntactic processing in a second ­ ­175–​ language: A review of methodologies and experimental findings. Second Language Research, 21(2), ­198. https://doi.org/10.1191/0267658305sr257oa ­ ­ ­ ­ Godfroid, A. (2019). Eye Tracking in Second Language Acquisition and Bilingualism : A Research Synthesis and Methodological Guide. Routledge. Grüter, T., Lew-Williams, C., & Fernald, A. (2012). Grammatical gender in L2: A production or a real-time pro­ ­191–215. ​­ ­ ­ ­ cessing problem? Second Language Research, 28(2), https://doi.org/10.1177/0267658312437990 Grüter, T., Lieberman, M., & Gualmini, A. (2010). Acquiring the scope of disjunction and negation in L2: A ­ ­127–154. ​­ ­ bidirectional study of learners of Japanese and English. Language Acquisition, 17(3), https://doi. ­ ­ org/10.1080/10489223.2010.497403 Grüter, T., Rohde, H., & Schafer, A. J. (2017). Coreference and discourse coherence in L2: The roles of gram­ ­199–229. ​­ ­ matical aspect and referential form. Linguistic Approaches to Bilingualism, 7(2), https://doi. ­ ­ org/10.1075/lab.15011.gru ­ 235–256. ­ ​­ Hopp, H. (2022). Second language sentence processing. Annual Review of Linguistics, 8(1), https://­ ­ ­­ ­​­­ ­​­­ ​­ doi.org/10.1146/annurev-linguistics-030821-054113 Jackson, C. N., & Bobb, S. C. (2009). The processing and comprehension of wh-questions among sec­ ­603–636. ​­ ­ ­ ­ ond language speakers of German. Applied Psycholinguistics, 30(4), https://doi.org/10.1017/ S014271640999004X Jackson, C. N., & Dussias, P. E. (2009). Cross-linguistic differences and their impact on L2 sentence process­ ­65–82. ​­ ­ ­ ­ ing. Bilingualism: Language and Cognition, 12(1), https://doi.org/10.1017/S1366728908003908 Jegerski, J. (2014). Self-paced reading. In J. Jegerski & B. VanPatten (Eds.), Research methods in second language psycholinguistics (pp. Routledge. ­­  ­20–49). ​­ Jiang, N. (2004). Morphological insensitivity in second language processing. Applied Psycholinguistics, 25, https://doi.org/10.1017/S0142716404001298 ­603–634. ​­ ­ ­ ­ Jiang, N. (2012). Conducting reaction time research in second language studies. Routledge. Juffs, A. (1998). Main verb vs. reduced relative clause ambiguity resolution in second language sentence processing. Language Learning, 48(1), https://doi.org/10.1111/1467-9922.00034 ­ ­107–147. ​­ ­ ­ ­­ ​­ Juffs, A. (2001). Psycholinguistically-oriented second language research. Annual Review of Applied Linguistics, 21, ­207–223. https://doi.org/10.1017/S0267190501000125 ​­ ­ ­ ­ Juffs, A. (2005). The influence of first language on the processing of wh-movement in English as a second language. Second Language Research, 21(2), https://doi.org/10.1191/0267658305sr255oa ­ 121–151. ­ ​­ ­ ­ ­ Juffs, A. (2017). Construct operationalization, L1 effects, and context in second language processing: Commentary on Cunnings (2017). Bilingualism, Language and Cognition, 20(4). https://doi.org/10.1017/ ­ ­ ­ ­ S1366728916000900

436

Experimental methods for studying second language learners Juffs, A. (2023). Chapter 2: Generative Considerations of Communicative Competence. In M. Kanwit & M. Solon (Eds.), Communicative Competence in a Second Language: Theory, Method, and Applications (pp. 21–39). Taylor Francis. https://doi.org/10.4324/9781003160779 Juffs, A. (in press). Chapter 2: Generative considerations of communicative competence. In M. Kanwit & M. Solon (Eds.), Communicative competence in a second language: Theory, method, and applications. Taylor Francis. Juffs, A., Han, N.-R., & Naismith, B. (2020). The University of Pittsburgh English Language Institute Corpus (PELIC). https://doi.org/10.5281/zenodo.3991977 ­ ­ ­ Juffs, A., & Rodríguez, G. A. (2014). Second language sentence processing. Routledge. Just, M. A., Carpenter, P. A., & Woolley, J. D. (1982). Paradigms and processes in reading comprehension. Journal of Experimental Psychology: General, 111(2), https://doi.org/10.1037/0096-3445.111. ­ ­228–238. ​­ ­ ­ ­­ ​­ 2.228 Kaan, E. (2014). Predictive sentence processing in L2 and L1: What is different? Linguistic Approaches to Bilingualism, 4(2), https://doi.org/10.1075/LAB.4.2.05KAA ­ ­257–282. ​­ ­ ­ ­ Kamide, Y., Altmann, G. T., & Haywood, S. L. (2003). The time-course of prediction in incremental sentence processing: Evidence from anticipatory eye movements. Journal of Memory and Language, 49(1), ­ ­133–​ ­156. https://doi.org/10.1016/S0749-596X(03)00023-8 ­ ­ ­­ ​­ ­ ­­ ​­ Keating, G. D. (2009). Sensitivity to violations of gender agreement in native and nonnative Spanish: An investigation. Language Learning, 59(3), https://doi.org/10.1111/j.1467-9922 ­eye-movement ​­ ­ ­503–535. ​­ ­ ­ ­ ­ ​­ .2009.00516.x Klein, W. (1975). Heidelberger Forschungs Projekt: Zur Sprache ausländischer Arbeiter: Syntaktische Analysen und Aspekte des kommunikativen Verhaltens. Zeitschrift für Literaturwissenschaft und Linguistik, 5(18), ­ ­78–121. ​­ Kuznetsova, A., Brockhoff, P. B., Christensen, R. H. B. (2017). lmerTest Package: Tests in linear mixed effects models. Journal of Statistical Software, 82(13), https://doi.org/10.18637/jss.v082.i13). ­ ­1–26. ​­ ­ ­ ­ Lardiere, D. (2009). Some thoughts on the contrastive analysis of features in second language acquisition. Second Language Research, 25(2), https://doi.org/10.1177/0267658308100283 ­ 173–227. ­ ​­ ­ ­ ­ Larsen-Freeman, D., & Long, M. H. (1991). An introduction to second language acquisition research. Longman. Lee, J. F., Malovrh, P. A., Doherty, S., & Nichols, A. (2020). A self-paced reading (SPR) study of the effects of processing instruction on the L2 processing of active and passive sentences. Language Teaching Research, 1362168820914025. https://doi.org/10.1177/1362168820914025 ­ ­ ­ Leeser, M., Brandl, A., Weissglass, C., Trofimovich, P., & McDonough, K. (2011). Task effects in second language sentence processing research. In P. Trofimovich and K. McDonough (Eds.). Applying priming methods to L2 learning, teaching, and research: Insights from psycholinguistics (pp. ­­  ­179–198). ​­ John Benjamins Publishing Company. Lidz, J., & Musolino, J. (2002). Children’s command of quantification. Cognition, 84(2), ­ ­113–154. ​­ https://doi. ­ org/10.1016/S0010-0277(02)00013-6 ­ ­­ ​­ ­ ­­ ​­ Lightbown, P., & Spada, N. (2013). How languages are learned (4th ed.). Oxford University Press. Loewen, S. (2009). 4. Grammaticality judgment tests and the measurement of implicit and explicit L2 knowledge. In Implicit and explicit knowledge in second language learning, testing and teaching (pp. ­­  ­94–112). ​­ Multilingual Matters. Marsden, H. (2009). Distributive quantifier scope in English-Japanese and Korean-Japanese interlanguage. Language Acquisition, 16(3), ­ ­135–177. ​­ https://doi.org/10.1080/10489220902967135 ­ ­ ­ Marsden, E., Mackey, A., & Plonsky, L. (2016). The IRIS repository: Advancing research practice and methodology. In A. Mackey & E. Marsden (Eds.), Advancing methodology and practice: The IRIS respository of instruments for research into second languages (pp. ­­  ­1–21). ​­ Routledge. Martin, K. I., & Juffs, A. (2021). Eye-tracking as a window into assembled phonology in native and non­native reading. Journal of Second Language Studies, 4(1), ­ ­66–96. ​­ https://benjamins.com/catalog/ ­ ­ ­ jsls.19026.mar Mitsugi, S., & MacWhinney, B. (2010). Second language processing in Japanese scrambled sentences. In ­­  ­159–176). ​­ B. VanPatten & J. Jegerski (Eds.), Research in second language processing and parsing (pp. ­ ­ ­ Benjamins. https://doi.org/10.1075/lald.53.07mit Mitsugi, S., & Macwhinney, B. (2016). The use of case marking for predictive processing in second ­language Japanese. Bilingualism: Language and Cognition, 19(1), ­ ­19–35. ​­ ­ ­ ­ https://doi.org/10.1017/ S1366728914000881

437

Alan Juffs and Shaohua Fang Montrul, S. (2000). Transitivity alternations in L1 acquisition: Toward a modular view of transfer. Studies in Second Language Acquisition, 22, ­229–273.https://doi.org/10.1017/S0272263100002047 ​­ ­ ­ ­ Musolino, J., & Lidz, J. (2006). Why children aren’t universally successful with quantification. Linguistics, 44, ­817–852. https://doi.org/10.1515/LING.2006.026 ​­ ­ ­ ­ Naismith, B., Han, N.-R., & Juffs, A. (2022). The University of Pittsburgh English Language Institute Corpus (PELIC). International Journal of Learner Corpus Research, 8(1), https://doi.org/10.1075/ ­ ­ ­121–138. ​­ ­ ­ ­ ijlcr.21002.nai Neubauer, K., & Clahsen, H. (2009). Decomposition of inflected words in a second language. Studies in Second Language Acquisition, 31, ­403–435. https://doi.org/10.1017/S0272263109090354 ​­ ­ ­ ­ Olson, D. J. (2021). Phonetic feature size in second language acquisition. Second Language Research, 38(4), ­ https://doi.org/10.1177/02676583211008951 ­1–28. ​­ ­ ­ ­ Özçelik, Ö. (2018). Interface Hypothesis and the L2 acquisition of quantificational scope at the syntaxinterface. Language Acquisition, 25(2), https://doi.org/10.1080/10489223. ­­semantics-pragmatics ​­ ­ ­213–223. ​­ ­ ­ ­ 2016.1273936 Packer, M. J. (2018). The science of qualitative research (2nd ed.). Cambridge University Press. Pickering, M. J., & Gambi, C. (2018). Predicting while comprehending language: A theory and review. Psychological Bulletin, 144(10), 1002. https://doi.org/10.1037/bul0000158 ­ ­ ­ ­ Prévost, P., & White, L. (2000). Missing surface inflection or Impairment in second language acquisition. Second Language Research, 16, ­103–134. https://doi.org/10.1191/026765800677556046 ​­ ­ ­ ­ Pritchett, B. L. (1992). Grammatical competence and parsing performance. Chicago University Press. Rayner, K. (1998). Eye movements in reading and information processing: 20 years of research. Psychological Bulletin, 124(3), ­ 372. https://doi.org/10.1037/0033-2909.124.3.372 ­ ­ ­­ ​­ Roberts, L. (2019). Psycholinguistic and neurolinguistic methods. In J. W. Schwieter & A. Benati (Eds.), The Cambridge handbook of language learning (pp. Cambridge University Press. https://doi. ­­  ­208–230). ​­ ­ org/10.1017/9781108333603.010 ­ ­ Roberts, L., & Liszka, S. A. (2013). Processing tense/aspect-agreement violations on-line in the second language: A self-paced reading study with French and German L2 learners of English. Second Language Research, 29(4), https://doi.org/10.1177/0267658313503171 ­ ­413–439. ​­ ­ ­ ­ Robinett, B. W., & Schachter, J. (Eds.) (1983). Second language learning: Contrastive analysis, error analysis, and related aspects. University of Michigan Press. Şafak, D. F., & Hopp, H. (2021). Verb bias and semantic persistence effects in L2 ambiguity resolution. Second Language Research, 38(4), https://doi.org/10.1177/0267658321997904 ­ ­705–736. ​­ ­ ­ ­ Schachter, J. (1989). Testing a proposed universal. In S. Gass & J. Schachter (Eds.), Linguistic perspectives on second language acquisition (pp. Cambridge University Press. ­­  ­73–88). ​­ Schad, D. J., Vasishth, S., Hohenstein, S., & Kliegl, R. (2020). How to capitalize on a priori contrasts in linear (mixed) models: A tutorial. Journal of Memory and Language, 110, 104038. https://doi.org/10.1016/j. ­ ­ ­ jml.2019.104038 Schütze, C. T. (1996). The empirical base of linguistics: Grammaticality judgements and linguistic methodology. Chicago University Press. Schütze, C. T., & Sprouse, J. (2013). Judgment data. In R. J. Podesva & D. Sharma (Eds.), Research methods in linguistics (pp. ­­  ­27–50). ​­ Cambridge University Press. Schwartz, B. D., & Sprouse, R. (1996). L2 cognitive states and the Full transfer/Full access model. Second ​­ ­ ­ ­ Language Research, 12, ­40–72. https://doi.org/10.1177/026765839601200103 Silva, R., & Clahsen, H. (2008). Morphologically complex words in L1 and L2 processing: Evidence from masked priming experiments in English. Bilingualism: Language and Cognition, 11, ­245–260. ​­ https://doi. ­ org/10.1017/S1366728908003404 ­ ­ Slabakova, R., White, L., & Brambatti Guzzo, N. (2017). Pronoun interpretation in the second language: Effects of computational complexity. Frontiers in Psychology, 8, 1236. https://doi.org/10.3389/fpsyg.2017.01236 ­ ­ ­ Spinner, P., & Gass, S. M. (2019). Using judgments in second language acquisition research. Routledge. Spinner, P., & Juffs, A. (2008). L2 grammatical gender in a complex morphological system: The case of Ger​­ ­ ­ ­ man. International Review of Applied Linguistics, 46, ­315–348. https://doi.org/10.1515/IRAL.2008.014 Sprouse, J., & Almeida, D. (2011). Power in acceptability judgment experiments and the reliability of data in syntax. Irvine: University of California & Ann Arbor: Michigan State University. Stoehr, A., Benders, T., van Hell, J. G., & Fikkert, P. (2017). Second language attainment and first language attrition: The case of VOT in immersed Dutch-German late bilinguals. Second Language Research, 33(4), ­ ­483–518. ​­ https://doi.org/10.1177/0267658317704261 ­ ­ ­

438

Experimental methods for studying second language learners Tanenhaus, M. K., Spivey-Knowlton, M. J., Eberhard, K. M., & Sedivy, J. C. (1995). Integration of visual ­ ­1632–1634. ​­ and linguistic information in spoken language comprehension. Science, 268(5217), DOI: 10. ­ 1126/science.7777863 Truscott, J., & Sharwood Smith, M. (2011). Input, intake, and consciousness. The quest for a theoretical founda­ ­497–528. ​­ ­ ­ ­ tion. Studies in Second Language Acquisition, 33(4), https://doi.org/10.1017/S0272263111000295 Vafaee, P., Suzuki, Y., & Kachisnke, I. (2017). Validating grammaticality judgment tests: Evidence from ­ ­59–95. ​­ ­ two new psycholinguistic measures. Studies in Second Language Acquisition, 39(1), https://doi. ­ ­ org/10.1017/S0272263115000455 White, L. (2003). Second language acquisition and universal grammar. Cambridge University Press. White, L., Brown, C., Chen, D., & Montrul, S. (1999). Psych verbs in second language acquisition. In E. Klein & G. Marthohardjono (Eds.), The development of second language grammars: A generative approach (pp. Benjamins. ­­  ­171–197). ​­ White, L., Bruhn-Garavito, J., Kawasaki, T., Pater, J., & Prévost, P. (1997). The researcher gave the subject a test about himself: Problems of ambiguity and preference in the investigation of reflexive binding. Language Learning, 47, 145–172. https://doi.org/10.1111/0023-8333.41997004 ­ ​­ ­ ­ ­­ ​­ Wilson, M. (1988). MRC Psycholinguistic Database: Machine-usable dictionary, Version 2.00. Behavior Research Methods, Instruments & Computers, 20(1), ­ 6–10. ­ ​­   https://doi.org/10.3758/BF03202594 ­ ­ ­ Winter, B. (2020). Statistics for linguists: An introduction using R. Routledge/Taylor & Francis.Wu, M., Zhang, L. J., Wu, D., & Wang, T. (2020). Effects of the interface categories on the acquisition patterns of English reflexives among learners of English as a foreign language. International Journal of Bilingualism, 24(4), https://doi.org/10.1177/1367006919875513 ­ 651–671. ­ ​­ ­ ­ ­ Wu, Z., & Juffs, A. (2022). Effects of L1 morphological type on L2 morphological awareness. Second Language Research, 38(4), 787–812. https://doi.org/10.1177/0267658321996417 Yang, C., Crain, S., Chomsky, N., Berwick, R., & Bolhuis, R. (2017). The growth of language: Universal grammar, experience, and principles of computation. Neuroscience and Biohavioral Review, 81, 103–119. ­ ​­ http://dx.doi.org/10.1016/j.neubiorev.2016.12.023 ­ ­ ­

439

28 EXPERIMENTAL METHODS TO STUDY BILINGUALS Ramesh Kumar Mishra and Seema Prasad

28.1

Introduction

About 7,000 languages are spoken by humans in the world today, and yet not many speak two or more languages. Those who speak two languages are called bilinguals and if they speak more than two, as multilinguals. There has been much scientific speculation on the issue of the evolution of language in our species, but there is not much evidence on why humans started to speak more than one language. We also do not know how the brain, during its evolution, accommodated multiple languages. How did the neural and the sensorimotor system support the manipulation of two languages and their symbolic codes? Scholars agree that bilingualism emerged during early human evolution and migration (Mishra, 2018). It then sustained itself and became popular, or it was adopted because of its clear social and cultural benefits. Thus, bilingualism developed since it helped its users gain a peek into the mental states of others while also educating them about their social and cultural norms, important for survival. Current estimates of language evolution place the date roughly at around 80,000 years ago. We cannot say when the language-ready brain also became bilingual. Nevertheless, for our current purpose, it is sufficient to accept that the practice of bilingualism is as ancient as the evolution of modern humans. Ancient Greeks were bilinguals or at least they were interested in learning more languages than their own (Rochette, 2010). This was vital for scholarly and cultural exchanges as well as commerce and warfare. In modern times, with the advent of Islam, Arabic scholars translated much of the existing Greek texts into Arabic which fostered scholarly communication during the Middle Ages. Later in the thirteenth century, scholars like Avicena, translated back much of these philosophical texts from Arabic into Latin which was the language of scholarly communication in those times (Adamson & Taylor, 2004). Although this textual bilingualism through translations of texts continued, we can’t say with certainty how far anyone spoke these languages fluently as a bilingual. This is important since most psycholinguistics1 research focuses on spoken bilingualism and its cognitive or neurological basis rather than translated texts. Nonetheless, the important point here is that throughout recorded history, mankind has practised bilingualism through speaking or writing. In our modern times, because of mass migration and mobility of people across geographic boundaries, bilingualism has flourished as a major cognitive and cultural activity in most domains. Therefore, scientific studies on bilingualism with DOI: 10.4324/9781003392972-32 440

Experimental methods to study bilinguals

a diverse range of methods have become indispensable. Multiple theories and models explain how – apart from these historical, cultural or evolutionary reasons – bilingualism has also emerged as an important area of research within multiple disciplines. The bilingual mind is a unique model to test certain important ideas about not just management of languages but also critical cognitive operations related to attention, control as well as environmental adoption. Psycholinguists have studied both language production and comprehension of bilinguals to understand the nature of the lexicon and core mechanisms of the language-cognition interface. They have examined many variables such as the age of acquisition, proficiency of use, working memory and its effects on language mechanisms using a range of experimental techniques (Kroll & Dussias, 2012; Luk & Bialystok, 2013). These techniques have included both behavioural, neuroimaging and computational approaches. Such work has revealed the fascinating aspects of interactions of two languages in the bilingual mind and their neural basis. Within cognitive sciences, the bilinguals’ language management, particularly the suppression of the non-target language during language behaviour has revealed the use of domain-general cognitive operations. The most crucial finding of the last several decades, albeit with its controversies has been the fact that bilinguals enjoy a superior cognitive advantage in non-linguistic tasks over monolinguals (e.g., Bialystok et al., 2004). Bilinguals even build a better cognitive reserve for later use in life when there is a general cognitive weakness due to ageing or disease. Therefore, for the neurologist treating patients or the cognitive neuroscientists investigating brain basis of cognition, bilingualism has proven to be a valuable model that links mind, brain, behaviour and cognition. The social psychologist and the sociolinguist including cultural theorists similarly study bilingualism and its impact on behaviour and culture using a wide range of methods. There are many facets to a bilingual’s language processing. The foremost among them is the fact that bilinguals activate both of their languages to any stimuli rather unconsciously (Marian & Spivey, 2003a, 2003b). If they see the picture of a dog, and if the bilingual is a Hindi-English speaker, they will activate both the word forms dog and kutta (meaning dog in Hindi). This has ­ been dubbed as non-selective parallel language activation. It is important to note that bilinguals ­ ​­ activate two phonological words for the same concept. These words could be phonologically similar (called cognates) or completely different (non-cognates). There are many cognates in European languages since they have a common ancestry. This has influenced much of the experimental studies carried on by researchers in these languages. How do we know that bilinguals indeed activate both languages in a non-selective manner? Psycholinguists and linguists have used many experimental paradigms to study this phenomenon in diverse populations of bilinguals over the years. The simplest and the most popular has been the object naming task in which bilinguals name line drawings presented on a computer screen. Naming latency – the time between stimuli onset and speech onset – is the typical dependent measure. It roughly indicates the ease of lexical retrieval. It has been observed that bilinguals are faster in naming objects in their native language compared to the second language. Naming tasks have also indicated the activation of two phonological words and the competition between them in the bilingual mind. Similarly, the picture-word interference task and translation naming/ recognition tasks have been used to measure language interference in bilinguals at different levels. Many experiments using these tasks have revealed that bilinguals indeed activate translations of the word spontaneously. In the last two decades, the visual world eye-tracking paradigm has been used to study language non-selective activation in bilinguals. All these methods will be discussed in greater detail later in this chapter. Language non-selective activation has also been naturally linked to the account of conflict resolution and inhibitory control (Blumenfeld & Marian, 2013).

441

Ramesh Kumar Mishra and Seema Prasad

Bilinguals may unconsciously activate the task-irrelevant language in any task scenario, but they also have to quickly suppress using domain-general cognitive control. That said, the focus of the current chapter will not be on cognitive control but the psycholinguistics of language non-selective activation primarily using the visual world paradigm.

28.2

Historical perspectives

The visual world paradigm indicates, as is evident from its name, the primary importance of the environment on our cognition. Classical cognitive science professed a computational mind that interpreted what is out there using an abstract algorithm, in an amodal way (Miller, 2003). Sensory information was used only as cues to trigger abstract mental representations. However, this changed with the developments in alternative philosophical frameworks such as embodied cognitive science (e.g., Barsalou, 2010; Shapiro, 2007) and the focus shifted to the mind-body interface with the environment. Therefore, our cognition emerges as the interaction between our bodies and environment. Later scholars dubbed this framework enactivism and its influence has been fostered in such disciplines as cognitive linguistics and psycholinguistics. This philosophical revival of the question of the mind opened the possibility to study language processing in a more interactive way. That is why it has become popular to study language processing in the presence of visual objects in an interactive scenario. Input from visual exploration, one of our ancient cognitive strategies of survival directly, affects how we name and remember objects. Therefore, the evolution of research paradigms and theories around the use of the visual world paradigm indicates a very radical departure in conceptual thinking on the nature of the mind and its interaction with the world at large. Symbols used to construct structures represent our experiential scenarios with real objects in the real world. We know a banana is yellow because we have seen bananas as yellow. It is not an abstract mental creation away from the realities of the world. The visual world paradigm precisely examines such interactive predictions when it presents objects on a computer screen along with some fragment of language that depicts them and measures eye movements towards them. These eye movements, therefore, indicate activations of stored sensory-motor traces in the mind of the experiencer which we can directly observe. Successful models of the visual world paradigms such as the Bilingual Language Interaction Network for Comprehension of Speech (BLINCS, Shook & Marian, 2013) incorporate such conceptual assumptions into their predictions. Additionally, the visual world paradigm also challenged modular and insular assumptions of language processing that had dominated theory for long since the development of classical psycholinguistics during the heydays of the cognitive revolution. Tanenhaus first proposed the linking hypothesis (Tanenhaus et al., 2000) to suggest the potential ways in which visual and linguistic information may interact in a given situation. In his original study, eye movements went towards objects placed before the viewer depending on the interpretation of the sentence they listened to. The main idea was that as participants process the spoken input, words or sentences generate meaning immediately. At the same time, visual information about the objects is extracted. Participants, thus, look at the objects matching the generated semantics. The fixations on these objects, their durations and order indicate this close interface between linguistic and visual representations. Thus, both vision and attention influence how language is processed which was in contradiction to the classical models of language processing that did not include any visual appreciation of objects or attention to them in the real world (Mishra, 2015). In sum, the visual world paradigm was not merely a random development of a paradigm using eye-tracking as a method, it indicated a radical shift in the very conceptual foundation of how we view cognition. It indicated that cognition is ever-changing and dynamic. It includes prediction 442

Experimental methods to study bilinguals

­Figure  28.1

Types of displays used in visual world studies. (A) Example from Altmann and Kamide (1999). Participants heard: The boy will eat the cake while viewing a display containing the image of a cake and other objects. Anticipatory eye movements were seen on cake before the word cake was spoken. (B) Example from Huettig and McQueen (2007). Participants heard the Dutch sentence Uiteindelijk keek ze naar de beker die voor haar stond (“Eventually she looked at the beaker that was in front of her”). The display contained a phonological competitor of beker (“beaver”), a visual-shape competitor (“bobbin”), a semantic competitor (“fork”) and an unrelated distractor (“umbrella”). (C) Example from Blumenfeld and Marian (2013). The English spoken word was pool and the display contained a Spanish cross-linguistic competitor pulgar (Spanish for “thumb”) and three unrelated distractors. (D) Example from Mishra and Singh (2014) with printed words including the phonological cohort of the translation equivalent of the spoken word. In this example, participants heard the English word parrot. The translation ­ word of parrot in Hindi is तोता (tota). The display contained the phonological competitor of this word तोप (“top” meaning “cannon” in English) and three unrelated distractors.

and anticipation, emerging from our interaction with things around them in the environment. This assumption gives eye movements their prominence as measurement indices of ongoing cognitive processing. It was only natural that researchers applied the visual world technique to study bilingual language processing in different ways (see Figure 28.1 for an example of the types of visual world displays). The paradigm has been used to investigate parallel language activation (Marian & Spivey, 2003a), language processing in bimodal bilinguals (Shook & Marian, 2012), bilingual infants (Byers-Heinlein et al., 2017), activation of orthography during spoken word processing in bilinguals (Mishra & Singh, 2014), language-mediated eye movements in illiterate individuals 443

Ramesh Kumar Mishra and Seema Prasad

(Huettig et al., 2011c), working memory effects on language non-selective activation (Ito et al., 2018), prediction (Altmann & Kamide, 1999) and inhibitory control (Mercier et al., 2014) during spoken word processing. Many researchers have also studied sentence-level processing in bilinguals using this paradigm. The paradigm is suitable to explore some of the unsolved issues in these domains, particularly the nature of bias in language processing. Response times do not provide much information about the evolution of bias of multiple representations during stimuli processing and decision making. It merely gives a gross value for an overall decision that arose out of many conflicting competitions. In contrast, the amount of time spent looking at an object, also known as fixation time, indicates the evolving nature of the competition between different representations. Most researchers think of such fixations during the visual world paradigm as a reflection of “search” since it comes very close to theories of search in cognitive psychology. Fixations may also indicate ‘confirmation’ of evolving biases since visual objects are tagged to mental representations. There are several good review articles on the visual world paradigm that document its applications in both monolingual and bilingual language processing (e.g., Huettig et al., 2011b; Magnuson, 2019). Huettig et al. (2011a), for example, compared cognitive mechanisms of visual search in attention research and the visual world paradigm. More recently, Magnuson (2019) has compared four different linking hypotheses of this interaction. The author goes deep into the nature of fixations seen in many visual world experiments and what could be their true nature. He argues that it is too simplistic to assume that language drives visual attention to objects on a screen. Or, it is even problematic to assume that visually depicted objects dynamically guide linguistic processing. Magnuson extends the proposal of Huettig et al. (2011a) where the authors had proposed a dynamic cognitive model of the visual world paradigm. In that model, working memory representations play an important role in the online integration of visual and linguistic information in such a paradigm. Further, Magnuson (2019) points out that cultural, individual and cognitive differences make it difficult to generalise the nature of eye movements seen in this paradigm across participants. A further limitation is that the visual world paradigm emphasises the simultaneous presentation of both linguistic and visual material to the user. However, in real life, we can produce and comprehend language without any co-presence of visual material. Therefore, this paradigm only considers one aspect of processing which could be limited (but see, Altmann, 2004). Similarly, the visual objects could also prime the interpretation of the speech input if they are highly familiar with common objects. Then, there is also evidence that eye movements are often launched into space where there are no objects. Therefore, we cannot be certain that language processing always depends on the visual world. Further, we also can’t be sure to what extent linguistics and visual representations interact and influence one another. However, despite these shortcomings, for about two decades, many researchers have generated a good amount of data using this paradigm.

28.3 28.3.1

Critical issues and topics Measuring bilingualism

Despite decades of research on bilingualism, measurements of bilingualism remain contested. How does one classify bilinguals in terms of their performance? Language proficiency is the ease of use of the language at will by the bilingual depending on the context. Therefore, measurement of proficiency in English has been critical in many experimental studies of bilingualism. Kroll’s

444

Experimental methods to study bilinguals

revised hierarchical model provided the first conceptual links between first and second language proficiency and its longitudinal changes over time (Kroll & Stewart, 1994). When any bilingual begins to speak the second language flawlessly, they do not have to translate them into the first language. Therefore, comprehension of words in the second language depends on the degree to which proficiency has been achieved in that language. Although many researchers believe that language proficiency is a continuum and can’t be dichotomised into good and bad (Luk & Bialystok, 2013), proficiency has been used extensively to measure bilingualism. Age of acquisition – that is, the age at which individuals learn to read, speak and write in a given language – plays an important role in language proficiency. The concept of proficiency could depend on the language environment, the context of the bilingual and the frequency of language use within that context over the lifespan. Proficiency is also bound to change as a function of use. High proficiency in the first language is natural for most individuals unless there have been drastic changes in language context such as early mobility to another country. However, many bilinguals also find themselves in language environments where the use of the first language is limited and thus display low proficiency. For example, the use of the mother tongue among the English educated in India is restricted to a few social and family occasions. Language proficiency has been often confounded with dominance and while there are separate measurement scales for them, it is often not easy to tease apart these constructs. Another point worth noting is the concept of ‘biliteracy’ which is used to describe proficiency in writing in two languages. This is not always considered a crucial variable since psycholinguists focus on speaking and listening when they discuss constructs such as dominance and proficiency in bilinguals. It’s also possible that many bilinguals who are proficient speakers in the second language never learnt to read or write in that language over their lifespan.

28.3.2  Role of proficiency in  cross-  linguistic activation In the visual world studies of bilingualism, particularly on language non-selective activation, proficiency in the second language plays a key role. It’s also important to understand how proficiency in the first language affects language activation in the second language. If one is good in the second language, then there are two possible predictions concerning the activation of words in the other language. For most bilinguals who are not balanced and have acquired the second language later in life to variable degrees of proficiency, it’s not intuitive to expect that they listen to words in the first language and activate their translations in the second language. Since there is no need to activate such words unless the language context and the environment in which they live are L2 dominant. Similarly, when they listen to words in the second language, they may activate the translations in the first language only if they are not so proficient in the second language. If they are highly proficient in the second language, they don’t have to activate the translations in the first language unless the context in which they live also has a strong first language use. Most researchers who have studied language activations in different bilinguals have not analysed the effects as a function of the context of the larger sociolinguistic set-up in which those bilinguals live and practice bilingualism (Mishra, 2018). For example, Indians living in the USA don’t have to translate English words they hear into Hindi, since there is no need for that in that context. However, if they are in an Indian city like Delhi, they may have to translate back words from L2 to L1, even if they are very good at L2. Thus, one’s proficiency in both the first and the second language and the way they influence crosslanguage activation in experimental tasks are dependent on context. It’s also natural to expect that the language requirements of the environment shape one’s proficiency in the first place.

445

Ramesh Kumar Mishra and Seema Prasad

28.4 28.4.1

Current contributions and research Parallel activation of two languages

In one of the earliest studies to demonstrate parallel language activation in bilinguals, Spivey and Marian (1999) tested Russian-English bilinguals on a visual world display with four objects. Participants listened to sentences in one language with a critical word embedded in them. Crucially, the display contained an object that sounded similar to the critical word in the bilingual’s other language. For instance, while listening to a sentence with the critical word (“marku” meaning stamp in Russian), participants saw four objects, one of which was a “marker”. Thus, the spoken language input in Russian (“marks”) overlapped considerably with the English word of one of the objects (“marker”). Participants were found to fixate more at this cross-linguistic phonological competitor suggesting that participants activated the English lexicon even while listening to Russian spoken input. Subsequently, cross-linguistic activation has been demonstrated across several languages using subtle variations of the visual world paradigm. Most studies investigating this issue have borrowed the methodology of Spivey and Marian (1999) and presented direct cross-linguistic competitors of the spoken word. However, there are quite a few studies in which phonological competitors of the translation of the spoken word have been used as the critical manipulation (e.g., Mishra and Singh, 2016; Prasad et al., 2020; Singh & Mishra, 2015). In these studies, participants are typically shown to be biased towards the picture of a gun (“bandook” in Hindi) when they hear the word “monkey” (“bandar” in Hindi). Thus, instead of presenting a direct cross-linguistic phonological competitor of the spoken word (“monkey”), these authors presented the phonological competitor of the translation word (“bandar” – “bandook”, henceforth referred to as TE cohort). This provided further evidence of the robust cross-linguistic activation in a bilingual population. There has been considerable debate on the directionality of cross-linguistic activation observed in bilinguals. Participants often activate L2 words with spoken input in L1 and activate L1 words with spoken input in L2. But it is not clear under what circumstances effects in both directions are to be expected and if there are reasons to expect a difference in the magnitude of the activations as a function of language input. For instance, Weber and Cutler (2004) showed that Dutch-English bilinguals living in the Netherlands (L1 dominant environment) found cross-linguistic activation only when the participants listened to words in English. There was no activation of English when participants listened to Dutch. In contrast, studies by Mishra and colleagues on Indian bilinguals (living in a L2-dominant environment) have observed cross-linguistic activations in both language directions (e.g., Mishra & Singh, 2016). Thus, from the studies so far, the directionality of the effects seems to depend on a complex interaction between second language proficiency of the participants and the dominant language in the environment.

28.4.2 Automaticity of parallel activation Another area of research in the past few years has been on the automaticity of parallel language activation. The key question here is whether bilinguals automatically and involuntarily activate both their languages or if this mechanism is dependent on task-intentions and resource-limitations? In one of the first studies to examine this, Blumenfeld and Marian (2013) investigated if parallel language activation is correlated with individual differences in executive control functions such as inhibition. They administered a visual world study where Spanish-English bilinguals heard words in English (e.g., “comb”) and viewed a display containing pictures of the spoken word referent 446

Experimental methods to study bilinguals

(comb), a Spanish competitor (“conejo”, “rabbit” in English) and unrelated distractors. Note that parallel language activation here was measured using a direct cross-linguistic competitor (and not the TE cohort as used in some other studies). Additionally, a non-linguistic spatial stroop task was administered to measure executive control. Results showed that reduced stroop effect (indicating better executive control) was correlated with increased ­cross-linguistic ​­ activation in ​­ the early stages of spoken word processing (< 500 ms) and reduced ­cross-linguistic activation in the later stages (> 600 ms). These findings indicate that individual differences in inhibitory control can modulate parallel language activation raising questions on the automaticity hypothesis. The automaticity of parallel language activation has also been tested with studies using the dual-task methodology with the visual world paradigm. The use of dual tasks to test automaticity of a process is a well-established procedure. The logic is as follows: If the objective is to test if a task A (visual world paradigm) involves automatic processing, then task A is administered simultaneously with a resource-intensive task B. If task A truly involves automatic processing, then performance on task A should not be impaired because of task B. Instead, if task A does not involve automatic processing and is resource-intensive, then performance on task A should be impaired because of task B. Using this methodology, Mishra and colleagues tested if cross-linguistic activation requires working memory resources by combining the visual world paradigm with a working memory task. In one of the studies (Prasad & Mishra, 2021), Hindi-English bilinguals listened to Hindi or English spoken words and viewed a display containing a TE cohort of the spoken word and three unrelated distractors. Note that, the TE cohort and distractors were presented as printed words as opposed to line drawings that are more commonly used in visual world studies. On every trial, just before the visual world sequence of events, participants were asked to maintain a set of letters (two, six or eight across two experiments) in memory. Maintaining the set of letters was assumed to act as a verbal working memory load to the participants while they viewed the visual world display. At the end of the visual world display, participants were presented with a backward sequence recognition task. Cross-linguistic activation was observed only in the early stages after word onset in the L1-L2 direction (spoken words in L1, printed words in L2). The interference from the secondary verbal memory task led to complete inhibition of cross-linguistic activation when spoken words were in L2. In another study (Prasad et al., 2020), a similar population of Hindi-English bilinguals listened to Hindi or English spoken words and viewed a display containing the line drawing of the spoken word referent, TE cohort and two unrelated distractors. A concurrent visual working memory load was given by displaying an array of five coloured squares before the onset of the spoken word. After the visual world display, a set of five coloured squares was shown again, and the participants were asked to judge if this test array was the same as the previously shown array. Cross-linguistic activation was reduced under a visual working memory load. This effect was stronger when participants were asked to click on the spoken word referent (Experiment 1), as opposed to when they had to just look and listen during the visual world display (Experiment 2), suggesting a modulating role of task difficulty. In sum, parallel language activations have been shown to reduce both under a verbal working memory load and a visual working memory load suggesting that these mechanisms involve working memory resources, at least to some extent. These effects are mediated by language proficiency of the participants and the directionality of the activation (L1 to L2 versus L2 to L1).

28.4.3

Prediction in bilinguals

The visual world paradigm has also provided useful insights into the mechanism of prediction in bilinguals. Individuals can often predict what the next word is going to be based on sentence 447

Ramesh Kumar Mishra and Seema Prasad

context. For example, hearing “the boy decided to eat the…”, one can predict that the following word is going to be something belonging to the food category. This can be measured by presenting four objects on the screen (in the visual world paradigm) where one of the objects, for example, is an apple. Participants will begin to fixate on the apple even before hearing the word “apple” in the sentence. Can bilinguals predict equally well in both their languages? Ito et al. (2018) examined this in a study with Japanese-English bilinguals and native English speakers. Participants heard sentences with predictable words in English (for example, “The tourists expected rain when the sun went behind the…” where the predictable word is “cloud”). The display contained the target object (cloud), a phonological competitor in English (clown), a TE cohort (kuma/bear which sounds similar to kumo/cloud) and an unrelated distractor. Both groups of participants predicted the target object, but Japanese-English speakers were slower. But only native English speakers predicted the phonological competitor (clown) suggesting that bilinguals may not be able to predict phonological information.

28.5 28.5.1 28.5.1.1

Main research methods Visual world paradigm

Basic characteristics of the paradigm

The essential technique of the visual world paradigm is very simple. Researchers present four-line drawings and a spoken word. Four drawings are considered optimal keeping in mind memory restrictions during retrieval. The spoken word refers either directly or indirectly to one of the objects on the screen. The other objects are labelled as distractors. Tanenhaus and colleagues bifurcated this design into action-based or passive. In the action-based version, the listener must follow the instructions to click on one of the objects. They argued that task instruction focuses the attention of the participants by providing them goals (Salverda et al., 2011). In the passive version (also called, “look and listen”), the participant does not have to do anything but is told to be attentive to both the spoken word and the panel of displays. Some of the pioneers of this method early on did not use any task instructions in their visual world studies because they believed that language input should spontaneously drive eye movements towards objects that it refers to (Huettig & Altmann, 2005). Given this framework, researchers manipulate many aspects of the design suitable for their research (see Huettig et al., 2011b; Salverda & Tanenhaus, 2017 for in-depth reviews). Sometimes they present both the spoken word and display simultaneously and track eye movements as the spoken signal evolves. In other versions, the spoken word is presented after a delay following the picture onset, mostly because it takes more time to process visual scenes compared to auditory stimuli. This design is also useful to study anticipatory eye movements triggered by the visual context. In some other rarer instances, the spoken word is presented first and after some delay, the visual objects appear. This method is useful when the visual objects are presented for a very short duration and overlapping it with the auditory stimuli may lead to inefficient processing of both stimuli. Staggering the presentation of the auditory and visual input allows enough time to process both. The amount of time the participant gets to survey the visual objects and retrieve their semantics and phonological forms affects eye movement behaviour. The most critical manipulation is the relationship between the spoken word and the target object on the display. In most experiments, the spoken word directly refers to this object and in other designs the direct referent is absent. In such cases, either an object whose name is semantically or phonologically related to the spoken 448

Experimental methods to study bilinguals

word is presented. Filler trials where none of the pictures is related to the spoken word are also typically included in an experiment. Filler trials can have many purposes. They can be analysed to examine if there is any inherent bias within the visual scene itself that could be driving the eye movements independent of the speech input. Within an experimental session, they can also be used to break any deliberate strategy the participants might develop to look for objects related to the spoken word. This method of extracting behaviour through fixations and saccades was a decisive change in the behavioural research method. The whole of experimental task-based research in psychology remains deeply tied to instructions. Participants invariably know what they are supposed to do in an experiment. Without this, desired results may not be obtained. However, visual world studies without any task instructions extracted raw spontaneous cognitive responses that the participants were not aware of, since much of our eye movements towards objects in space operate below the thresholds of our consciousness. Mere listening to fragments of speech could activate the oculomotor system and orient attention towards relevant objects in the environment. This tight coupling between language and attention orienting, a very core evolutionary trait of humans, could be easily seen in the visual world paradigm. Therefore, the visual world paradigm opened a completely new chapter in psycholinguistics research and fostered the connections between psycholinguistics, vision science and cognitive psychologists (Mishra, 2015).

28.5.1.2 Analysing visual world data The analysis of the visual world data has also consistently evolved over the decades. At first, researchers analysed the data using non-parametric statistics by counting the number of fixations towards different objects as a function of time. However, it was soon realised that fixations seen in this paradigm are a dynamic representation of shifting biases in attention, an interplay between linguistic and visual information. Therefore, it was thought to be prudent to analyse the proportion of fixations to different objects over time and use appropriate statistics that are sensitive to nonlinear time-course data similar to EEG data. In contemporary research, the entire timeline often extending into one or two seconds is divided into bins of 20 or 50 milliseconds and then the proportion of fixations in each time-bin is counted. The visual representation of this data shows evolving biases in attention towards different objects with the unfolding of speech signals over time. Many researchers also plot confidence intervals for such data. Therefore, visual world data being essentially time-course data also allows researchers to use growth curve techniques (Mirman, 2017; Huang & Snedeker, 2020), and include individual different factors in such analysis (Mirman et al., 2008). Others have used multiple logistic regression to analyse such data (Barr, 2008). The objective of all these analyses, finally, is to quantify the evolving cognition as a function of the interaction between both visual and linguistic information. Fine analysis of such data reveals subtle individual differences in both attention and language processing in participants.

28.5.1.3

Blank screen paradigm

An important variation of the visual world paradigm is the blank screen paradigm in which participants first see the visual world display and then hear the critical sentence/word following which the line drawings are removed revealing a blank screen (Altmann, 2004). Eye movements in response to the spoken input is measured on the blank screen. It is typically found that participants look at the location previously occupied by the relevant picture even though that location is blank 449

Ramesh Kumar Mishra and Seema Prasad

and currently contains no visual input. This has been taken to suggest that language-mediated eye movements do not need a concurrent visual scene and instead can be directed based on the mental record of the objects presented.

28.5.2 28.5.2.1

Visual word recognition Basics of the paradigm

While much of this chapter has focused on the visual world paradigm and ways to study bilingualism using this paradigm, it is also important to mention some other methods that are commonly used in the study of bilingualism. We start with the visual word recognition task. The process of word recognition in any language involves retrieving lexical information about a presented word. Lexical information may include phonological (sound), orthographic (spelling) or semantic (meaning) information stored in the mental lexicon of the individual. The speed and accuracy of word recognition can tell a lot about the nature of these lexical representations and the ease of retrieval. This is usually measured through a word recognition task in which a string of letters is presented to a participant and the task is to make a judgement related to the input string. The judgement can be of various types, for example, to decide if the input string is a word or not (lexical decision) or to decide if the input word belongs to a specific category (semantic categorisation), among others. In the earliest studies, words that exist in some manner in two languages were used as critical stimuli (Grainger, 1993). One important example is cognate words. Cognates are words that have the same (or similar) orthographic form in two languages and largely overlap in meaning as well. Dutch and English have several cognate words (example: “wolf”). Bilinguals are typically faster in recognising cognates than in reading non-cognates. This is because bilinguals activate representations of the word in both their languages. In the case of non-cognate words, there is interference from the non-target language which needs to be resolved for the right form to be selected which slows down responses. The cognate facilitation effect has been found in both L1 and L2 across different bilingual populations and is further evidence for non-selective parallel activation of both languages in bilinguals (Dijkstra & Kroll, 2005).

28.5.2.2

Masked priming paradigm

Another line of research that has robustly demonstrated parallel activation is the masked priming paradigm. In a typical priming study, a prime stimulus is presented first followed by a target stimulus. Responses to the target word are facilitated if the prime is in some way related to the target stimulus. The task can be to decide if the presented string of letters is a word or to judge the semantic category of the word among others. Grainger (1998) conducted some of the initial masked priming studies to study parallel activation in bilinguals in which a lexical decision task or a semantic categorisation task was presented to the participants. Crucially, the translation equivalent of the target was presented as a masked prime (e.g., “arbre” which means “tree” in French) just before the target word (“tree”). All participants completed a lexical decision task (judge if the presented string of letters constitutes a word) and a semantic categorisation task (judge if the presented word belongs to a given category, such as “fruit”). Responses to the target were faster when they were preceded by the translation equivalent than an unrelated word because of shared semantic representation of the two words in memory which are activated by the processing of the prime word. 450

Experimental methods to study bilinguals

28.5.3

Picture naming

The most common method used to study the mechanisms underlying language production in bilinguals is the object or picture naming task. In this task, participants are presented with pictures of common objects (black-and-white line drawings or coloured images) and asked to name the picture. The time taken to voice the name of the picture, after the onset of the picture on the screen, is referred to as “naming latency”. This measure reflects the time taken to recognise the picture, select the right word and articulate it in language. Naming latency can shed light on ease of access, strength of representation in the lexical and semantic network in the brain. One of the crucial manipulations in bilingualism studies with object naming is the blocking or mixing of the languages to be used for naming. In a “pure block” design, participants are asked to name objects in one of the two languages for a block of trials and use another language in another block of trials. In the “mixed block” design, the language to be used can dynamically change from trial-to-trial. These two manipulations give rise to interesting measures that inform researchers about how bilinguals process each of their languages. Take, for instance, a bilingual whose native language is Hindi (L1) and second language is English (L2). A typical object naming experiment with such a bilingual could involve pure blocks where the participant is asked to use Hindi for one block of trials and English for another block of trials. Then, there would be a mixed block where the language to be used dynamically shifts between Hindi and English from trial-to-trial. This gives rise to two types of trials within the mixed block: “Switch” trials where the language to be used on current trial (e.g., Hindi) is different from the one used on the previous trial (e.g., English), and “Stay” trials where the language used on the current trial (e.g., Hindi) is the same as the one used on the previous trial (e.g., Hindi). Subtracting the naming latency on switch trials from the latency on stay trials gives rise to a measure known as “Switch cost”. Further, switch cost can be calculated individually for each language (L1 to L2 switch cost and L2 to L2 switch cost). The magnitude of the switch cost reflects the ease of switching between the two languages. Another measure that can be calculated is the “Mixing cost”. It is the difference between latency on “stay” trials in the mixed block and the trials in the pure block, for a given language. Switching and mixing costs can shed light on language control mechanisms that are involved in managing the parallel activation of the two languages. Detailed investigations have provided insight on the nature of proactive and reactive language control used by bilinguals based on their proficiency and experience (see Bobb & Wodniecka, 2013; Declerck & Philipp, 2015 for detailed reviews).

28.5.4

Translation recognition and production

In the basic version of a translation recognition task, word pairs are sequentially presented. One half of the trials, the second word is the translation equivalent of the first word and thus requires a “yes” response (e.g., “monkey-bandar” ­ ​­ in Hindi). On the other half of trials, the second word is a non-translation distractor word and thus the required response is “no”. The speed of recognising a translation is an indication of the shared lexical representations of the words in the two languages – the greater the overlap, the faster the response times are. While many of the initial findings in translation recognition were using languages with cognate words, subsequently crosslinguistic activations have also been demonstrated in languages that do not share the same script, that is, the collection of characters constituting the writing system of the language (Sunderman & Priya, 2012). In these studies, the “no” trials are often the critical trials which are manipulated to reveal the extent to which the first word activates representations related to the second word. For instance, 451

Ramesh Kumar Mishra and Seema Prasad

in the example given earlier, the second word can be a phonological competitor of the translation ­ ­ word (“monkey”, “bandook”) which is then compared to a control condition where the second ­ ​­ ­ word is unrelated to the first or the second word (“monkey”-“kalam”, pen in Hindi). Responses in the related condition (of “no” trials) are slower compared to the control condition because of the interference caused by the activation of the transaction equivalent word. Similarly, a translation production task involves the presentation of a word on this screen and the task is to say the translation of the word. The speed of retrieval, as in the case of translation recognition, reveals the extent to which input in one language triggers activation of the other language. It is important to note that the findings in both translation recognition and production depends on the directionality of translation, that is, whether the bilingual is translating from L2 to L1 (backward translation) or from L1 to L2 (forward translation). Existing models of word recognition and translation have specific predictions on whether forward and backward translation should be easier for bilinguals, especially regarding whether the bilinguals are early learners or with high second language proficiency (e.g., Kroll & Stewart, 1994; Van Heuven et al., 1998). Additionally, word frequency and the degree of orthographic similarity between the languages also play an important role, among other factors.

28.6

Recommendations for practice

28.6.1  Accurately measuring the degree of  bilingualism –  P   roficiency and usage One of the biggest challenges in bilingualism research is that there is no single agreed-upon method for assessing the degree of bilingualism across diverse populations. Most researchers typically compare bilinguals and monolinguals to draw conclusions regarding bilingualism; however, it is important to note that not all bilinguals are the same (de Bruin, 2019). Bilingual individuals can differ in terms of the age of acquisition of the second language, formal education of the second language, proficiency in reading, writing and speaking. There are several detailed questionnaires that collect self-reported data on the language background of individuals (Anderson et al., 2018; Kaushanskaya et al., 2020; Li et al., 2006). It is vital to use these questionnaires as self-reported measures of language proficiency are often the most important source of information on the individual’s language profile. The use of these measures has been driven by the continued understanding that the distinction between a bilingual and a monolingual is fuzzy and that it is more useful to view bilingualism as a continuous variable and not as a categorical variable.

28.6.2  Differentiating language proficiency and usage An important issue, which is sometimes neglected in the literature, is the difference between language proficiency and usage. For instance, it is possible for an individual to be highly proficient in the second language but not use it much. This often happens when the individual is embedded in a social or cultural context that does not encourage the use of L2. In this situation, while the individual can still technically be referred to as a “bilingual”, they may not be practicing their bilingualism much. Thus, it is necessary to use self-report questionnaires that clearly obtain data on the extent of usage of both languages, apart from the degree of proficiency in both languages. While these measures may correlate in some cases, it is not an assumption that can be taken for granted. The data from self-reported subjective questionnaires need to be supplemented with objective tests of language proficiency. Several tasks are commonly used by researchers for this purpose: object naming tasks, vocabulary tests, semantic or verbal fluency tests, are some examples. While 452

Experimental methods to study bilinguals

it is difficult to establish if the subjective or objective methods of measuring bilingualism are more reliable, they often go hand-in-hand and can both be used to arrive at a comprehensive understanding of the individual’s language profile.

28.6.3 Acknowledging the role of context In the last several decades, cognitive science has made considerable progress by restricting its level of analyses to an individual and implicitly assuming that these analyses can be conducted in isolation without considering the surrounding context of that individual. However, over the last few years, it has become increasingly apparent that humans are contextual beings and that almost all aspects of our behaviour are influenced by where we are and who we are with. Bilingualism is no exception. If you consider two bilinguals living in different countries and match them on second language proficiency and usage, it is possible that some differences will remain because of the difference in their sociocultural context. Quantifying these contextual differences is an ongoing challenge for the field and there are no clear answers. But we can start by acknowledging the role of contextual factors, such as country, dominant language in the immediate context, socioeconomic status, the status of an individual within a society (native versus immigrant) among others. Not acknowledging these factors and lumping together individuals who differ on these dimensions into one group (“bilinguals”) can be problematic. Several studies have shown differences in performance on a range of tasks measuring language activation between groups of bilinguals who differ only in their cultural context (Berkes et al., 2018; Linck et al., 2009; Roychoudhuri et al., 2016; Zhang et al., 2013). Thus, bilinguals who speak the same language with the same level of proficiency but living and experiencing their bilingualism in a different context may not necessarily engage their cognitive resources in the same way. These contextual differences often arise due to bi- and multiculturalism. Practicing one’s bilingualism in an environment which includes people from varying cultural and linguistic backgrounds is distinct from being in a homogenous context. In an initial attempt at studying the role of cultural context using the visual world paradigm, Kapiley and Mishra (2018) showed that language-mediated eye movements were influenced by the presentation of culture-specific images in a sample of Bengali-English bilinguals in India. More research manipulating context – both within and between participants – is necessary to understand how the world around us shapes our internal cognitive processing.

28.7

Future directions

Most current research is focused on individuals who speak and/or proficient in two languages. Individuals who speak more than two languages are often categorised as “bilinguals” for practical purposes of an experimental study. However, there is a small but significant population in the world who are proficient in more than two languages. The country of origin of the authors of this chapter – India – is a perfect example of thriving multilingualism. India has a vast number of people who are fluent in at least three languages: their native language, English and Hindi. It is also not uncommon to find individuals who are fluent in four or more languages. It is possible to find such examples in other parts of the world as well. But there is not much systematic research on the cognitive consequences of multilingualism. Investigating multilinguals will further add to our understanding of how our brains accommodate multiple languages. Bi- and multilingualism flourishes in a dynamic, free, progressive and mobile environment. When people move to other regions, when they try to pick up a different language, when they show 453

Ramesh Kumar Mishra and Seema Prasad

interest in other cultures, bilingualism achieves its goals. The scientific studies of bilingualism reflect this new scenario in the world today than what it was decades ago. The studies also show that the human brain is very plastic that accommodates such changes in a short period and adapts to new environmental challenges. The future of bilingualism research will be around global themes such as culture, identity, educational and social needs. This will also be far more inclusive to migrants and people who are shifting from one cultural boundary to another. Bilingualism research will include these variables to make true sense of the world we live in today. This contrasts with countries and cultures that are not so open or who have hostile political positions on movements and cultures. As scientific methods improve and experiments become more real world, it will be possible to tap into the changes in the bilingual mind as a function of enactment of cognition and not view cognition as a fossilised structure in the brain with unchanging attributes. Despite functional behavioural psychology’s historical rigidity, there are hints of such broadening as theories about mind and cognition evolve. The newer approach to cognition is more dynamic, environmental and deeply rooted in our collective consciousness as language-speaking mammals who must understand their world and create narratives. Bilingualism research will be about all of these. As neuroscience, psychology and linguistics come together, as our philosophical views change and we look at a much wider horizon of events, those who speak more languages will be better off in comprehending the world around them. That will also be tied to their social and economic life. This chapter merely scratched the surface in a very limited way based on laboratory experiments, but the true arena of bilingualism is in the real world out there. It has both rich symbolism depicted in multiple scripts and other visuals that add more colours to the speech signal. Bilingualism researchers need to consider everything that is used to express mental content in multiple formats instead of focusing only on speech signals or verbal matter. Diverse methods must be integrated to offer more detailed ideas about mental operations going beyond the classical distinctions of linguistic and non-linguistic stimuli since the mind makes no such distinctions. Therefore, the future of bilingualism research should involve theoretical widening, methodological integration and cultural innovation in the broad sense of human activity.

Acknowledgements RKM would like to acknowledge the support of an Institute of Eminence grant from University of Hyderabad. SGP was supported by a fellowship from the Humboldt foundation.

Note 1 Psycholinguistics is a discipline which examines how people acquire, comprehend and produce language from a psychological perspective.

Further reading Bialystok, E., Craik, F. I., & Luk, G. (2012). Bilingualism: consequences for mind and brain. Trends in Cogni­ ­240–250. ​­ tive Sciences, 16(4), Costa, A., & Sebastián-Gallés, N. (2014). How does the bilingual experience sculpt the brain?. Nature ­ ­336–345. ​­ Reviews Neuroscience, 15(5), Fitch, W. T. (2010). The evolution of language. Cambridge University Press. Green, D. W. (1998). Mental control of the bilingual lexico-semantic system. Bilingualism: Language and ­ ­67–81. ​­ Cognition, 1(2),

454

Experimental methods to study bilinguals Green, D. W., & Abutalebi, J. (2013). Language control in bilinguals: The adaptive control hypothesis. Journal of Cognitive Psychology, 25(5), ­ ­515–530. ​­ Grosjean, F. (1982). Life with two languages: An introduction to bilingualism. Harvard University Press. Hayakawa, S., Costa, A., Foucart, A., & Keysar, B. (2016). Using a foreign language changes our choices. Trends in Cognitive Sciences, 20(11), ­ ­791–793. ​­ Peeters, D. (2019). Virtual reality: A game-changing method for the language sciences. Psychonomic Bulletin & Review, 26(3), ­ ­894–900. ​­

Related topics New directions in statistical methods for experimental linguistics; experimental methods to study language learners; assessing adult linguistic competence; controlling social factors in experimental linguistics

References Adamson, P., & Taylor, R. C. (Eds.). (2004). The Cambridge companion to Arabic philosophy. Cambridge University Press. Altmann, G. T. (2004). Language-mediated eye movements in the absence of a visual world: The ‘blank ­ ­B79-B87. ​­ screen paradigm’. Cognition, 93(2), Altmann, G. T., & Kamide, Y. (1999). Incremental interpretation at verbs: Restricting the domain of subse­ ­247–264. ​­ quent reference. Cognition, 73(3), Anderson, J. A., Mak, L., Keyvani Chahi, A., & Bialystok, E. (2018). The language and social background questionnaire: Assessing degree of bilingualism in a diverse population. Behavior Research Methods, ­ ­250–263. ​­ 50(1), Barr, D. J. (2008). Analyzing ‘visual world’ eyetracking data using multilevel logistic regression. Journal of ­ ­457–474. ​­ Memory and Language, 59(4), ­ Barsalou, L. W. (2010). Grounded cognition: Past, present, and future. Topics in Cognitive Science, 2(4), ­716–724. ​­ Berkes, M., Friesen, D. C., & Bialystok, E. (2018). Cultural context as a biasing factor for language activation ­ ­1032–1048. ​­ in bilinguals. Language, Cognition and Neuroscience, 33(8), Bialystok, E., Craik, F. I., Klein, R., & Viswanathan, M. (2004). Bilingualism, aging, and cognitive control: ­ 290. Evidence from the Simon task. Psychology and Aging, 19(2), Blumenfeld, H. K., & Marian, V. (2013). Parallel language activation and cognitive control during spoken ­ ­547–567. ​­ word recognition in bilinguals. Journal of Cognitive Psychology, 25(5), Bobb, S. C., & Wodniecka, Z. (2013). Language switching in picture naming: What asymmetric switch costs ­ (do not) tell us about inhibition in bilingual speech planning. Journal of Cognitive Psychology, 25(5), ­ ​­ 568–585. Byers-Heinlein, K., Morin-Lessard, E., & Lew-Williams, C. (2017). Bilingual infants control their languages ­ ­9032–9037. ​­ as they listen. Proceedings of the National Academy of Sciences, 114(34), de Bruin, A. (2019). Not all bilinguals are the same: A call for more detailed assessments and descriptions of ­ 33. bilingual experiences. Behavioral Sciences, 9(3), Declerck, M., & Philipp, A. M. (2015). A review of control processes and their locus in language switching. ­ ­1630–1645. ​­ Psychonomic Bulletin & Review, 22(6), Dijkstra, T., & Kroll, J. F. (2005). Bilingual visual word recognition and lexical access. Handbook of Bilingualism: Psycholinguistic Approaches, 178, 201. ​­ Grainger, J. (1993). Visual word recognition in bilinguals. The Bilingual Lexicon, 6, ­11–26. Grainger, J. (1998). Masked priming by translation equivalents in proficient bilinguals. Language and Cogni­ ­601–623. ​­ tive Processes, 13(6), Huang, Y., & Snedeker, J. (2020). Evidence from the visual world paradigm raises questions about unaccusativity and growth curve analyses. Cognition, 200, 104251. Huettig, F., & Altmann, G. T. (2005). Word meaning and the control of eye fixation: Semantic competitor ef­ ­B23-B32. ​­ fects and the visual world paradigm. Cognition, 96(1), Huettig, F., & McQueen, J. M. (2007). The tug of war between phonological, semantic and shape information ​­ ­ ­460–482. ​­ in ­language-mediated visual search. Journal of Memory and Language, 57(4),

455

Ramesh Kumar Mishra and Seema Prasad Huettig, F., Mishra, R. K., & Olivers, C. N. (2012). Mechanisms and representations of language-mediated visual attention. Frontiers in Psychology, 2, 394. Huettig, F., Olivers, C. N., & Hartsuiker, R. J. (2011a). Looking, language, and memory: Bridging research from the visual world and visual search paradigms. Acta Psychologica, 137(2), ­ ­138–150. ​­ Huettig, F., Rommers, J., & Meyer, A. S. (2011b). Using the visual world paradigm to study language processing: A review and critical evaluation. Acta Psychologica, 137(2), ­ ­151–171. ​­ Huettig, F., Singh, N., & Mishra, R. K. (2011c). Language-mediated visual orienting behavior in low and high literates. Frontiers in Psychology, 2, 285. Ito, A., Corley, M., & Pickering, M. J. (2018). A cognitive load delays predictive eye movements similarly during L1 and L2 comprehension. Bilingualism: Language and Cognition, 21(2), ­ ­251–264. ​­ Kapiley, K., & Mishra, R. K. (2018). Iconic culture-specific images influence language non-selective translation activation in bilinguals: Evidence from eye movements. Translation, Cognition & Behavior, 1(2), ­ ­221–250. ​­ Kaushanskaya, M., Blumenfeld, H. K., & Marian, V. (2020). The language experience and proficiency questionnaire (leap-q): ­­ ​­ Ten years later. Bilingualism: Language and Cognition, 23(5), ­ ­945–950. ​­ Kroll, J. F., & Dussias, P. E. (2012). The comprehension of words and sentences in two languages. In T. K. ­­  ­216–243). ​­ Bhatia & W. C. Ritchie (Eds.). The handbook of bilingualism and multilingualism (pp. John Wiley & Sons. Kroll, J. F., & Stewart, E. (1994). Category interference in translation and picture naming: Evidence for asymmetric connections between bilingual memory representations. Journal of Memory and Language, 33(2), ­ ­149–174. ​­ Li, P., Sepanski, S., & Zhao, X. (2006). Language history questionnaire: A web-based interface for bilingual research. Behavior Research Methods, 38(2), ­ ­202–210. ​­ Linck, J. A., Kroll, J. F., & Sunderman, G. (2009). Losing access to the native language while immersed in a second language: Evidence for the role of inhibition in second-language learning. Psychological Science, ­ ­1507–1515. ​­ 20(12), Luk, G., & Bialystok, E. (2013). Bilingualism is not a categorical variable: Interaction between language proficiency and usage. Journal of Cognitive Psychology, 25(5), ­ ­605–621. ​­ Magnuson, J. S. (2019). Fixations in the visual world paradigm: where, when, why?. Journal of Cultural ­ ­113–139. ​­ Cognitive Science, 3(2), Marian, V., & Spivey, M. (2003a). Competing activation in bilingual language processing: Within-and ­between-language ​­ competition. Bilingualism: Language and cognition, 6(2), ­ ­97–115. ​­ Marian, V., & Spivey, M. (2003b). Bilingual and monolingual processing of competing lexical items. Applied ­ ­173–193. ​­ Psycholinguistics, 24(2), Mercier, J., Pivneva, I., & Titone, D. (2014). Individual differences in inhibitory control relate to bilingual spoken word processing. Bilingualism: Language and Cognition, 17(1), ­ ­89–117. ​­ Miller, G. A. (2003). The cognitive revolution: a historical perspective. Trends in Cognitive Sciences, 7(3), ­ ­141–144. ​­ Mirman, D. (2017). Growth curve analysis and visualization using R. Chapman and Hall/CRC. Mirman, D., Dixon, J. A., & Magnuson, J. S. (2008). Statistical and computational models of the visual world paradigm: Growth curves and individual differences. Journal of Memory and Language, 59(4), ­ ­475–494. ​­ Mishra, R. K. (2015). Interaction between attention and language systems in humans. Springer. Mishra, R. K. (2018). Bilingualism and cognitive control (Vol. 6). Springer. Mishra, R. K., & Singh, N. (2014). Language non-selective activation of orthography during spoken word processing in Hindi–English sequential bilinguals: An eye tracking visual world study. Reading and Writing, 27(1), ­ ­129–151. ​­ Mishra, R. K., & Singh, N. (2016). The influence of second language proficiency on bilingual parallel language activation in Hindi–English bilinguals. Journal of Cognitive Psychology, 28(4), ­ ­396–411. ​­ Prasad, S., & Mishra, R. K. (2021). Concurrent verbal working memory load constrains cross-linguistic translation activation: A visual world eye-tracking study on Hindi–English bilinguals. Bilingualism: Language and Cognition, 24(2), ­ ­241–270. ​­ Prasad, S., Viswambharan, S., & Mishra, R. (2020). Visual working memory load constrains language non­selective activation under ­task-demands. ​­ Linguistic Approaches to Bilingualism, 10(6), ­ ­805–846. ​­ Rochette, B. (2010). Greek and Latin bilingualism. In E. J. Bakker (Ed). A companion to the Ancient Greek language (pp. 281–293). John Wiley & Sons, Ltd.

456

Experimental methods to study bilinguals Roychoudhuri, K. S., Prasad, S. G., & Mishra, R. K. (2016). Iconic native culture cues inhibit second language production in a non-immigrant population: Evidence from Bengali-English bilinguals. Frontiers in Psychology, 7, 1516. Salverda, A. P., Brown, M., & Tanenhaus, M. K. (2011). A goal-based perspective on eye movements in visual ­ ­172–180. ​­ world studies. Acta Psychologica, 137(2), Salverda, A. P., & Tanenhaus, M. K. (2017). The visual world paradigm. In A. M. de Groot & P. Hago­ ort (Eds.), Research methods in psycholinguistics and the neurobiology of language: A practical guide ­­  ­89–110). ​­ (pp. John Wiley  & Sons. Shapiro, L. (2007). The embodied cognition research programme. Philosophy Compass, 2(2), ­ ­338–346. ​­ Shook, A., & Marian, V. (2012). Bimodal bilinguals co-activate both languages during spoken comprehen­ ­314–324. ​­ sion. Cognition, 124(3), Shook, A., & Marian, V. (2013). The bilingual language interaction network for comprehension of speech. Bilingualism: Language and Cognition, 16(2), ­ ­304–324. ​­ Singh, N., & Mishra, R. K. (2015). Unintentional activation of translation equivalents in bilinguals leads to attention capture in a cross-modal visual task. PloS One, 10(3), ­ e0120131. Spivey, M. J., & Marian, V. (1999). Cross talk between native and second languages: Partial activation of an ­ ­281–284. ​­ irrelevant lexicon. Psychological Science, 10(3), Sunderman, G. L., & Priya, K. (2012). Translation recognition in highly proficient Hindi–English bilinguals: The influence of different scripts but connectable phonologies. Language and Cognitive Processes, 27(9), ­ ­1265–1285. ​­ Tanenhaus, M. K., Magnuson, J. S., Dahan, D., & Chambers, C. (2000). Eye movements and lexical access in spoken-language comprehension: Evaluating a linking hypothesis between fixations and linguistic processing. Journal of Psycholinguistic Research, 29(6), ­ ­557–580. ​­ Van Heuven, W. J., Dijkstra, T., & Grainger, J. (1998). Orthographic neighborhood effects in bilingual word ­ ­458–483. ​­ recognition. Journal of Memory and Language, 39(3), Weber, A., & Cutler, A. (2004). Lexical competition in non-native spoken-word recognition. Journal of memory and language, 50(1), ­ ­1–25. ​­ Zhang, S., Morris, M. W., Cheng, C. Y., & Yap, A. J. (2013). Heritage-culture images disrupt immigrants’ second-language processing through triggering first-language interference. Proceedings of the National ­ ­11272–11277. ​­ Academy of Sciences, 110(28),

457

29 EXPERIMENTAL METHODS TO STUDY CULTURAL DIFFERENCES IN LINGUISTICS Evangelia Adamou

29.1

Introduction and definitions

A relatively recent development in linguistics is the use of experimental methods in different cultural settings. It takes place within a larger trend in psychology that questions the robustness of theories and generalizability of its findings following the observation that research predominantly relies on participants from so-called ‘Western, educated, industrialized, rich, and democratic’ (WEIRD) societies (Henrich et al., 2010). To set the scene for this discussion, let us start by looking more closely at the two keywords of this chapter: ‘experimental’ and ‘cultural.’ ‘Experimentation’ is a way for scientists to isolate, prepare, and manipulate their object of investigation and its environment to test and update a scientific theory, that is, a collection of laws and generalizations that aims to describe, explain, and predict a set of phenomena (for an overview of this topic in philosophy of science, see Boyd & Bogen, 2021). In comparison, ‘observation’ is when scientists note details about their object of investigation, either under natural or experimental conditions. Observation and experimentation may require the use of instruments and, therefore, the quality of measures obtained, and their analysis is key to the success of the scientific endeavour. Since experiments are about control, the lab is the typical place to conduct them. A controlled environment may be crucial in physics, where scientists may also need to use experimental equipment that is only available in the lab (e.g., a particle accelerator). In addition, so-called ‘field experiments,’ conducted outside the lab, are common in many scientific disciplines like geochemistry and economics (e.g., where scientists test policy-related solutions in the real world). Similarly, linguistic experiments can be conducted in the field by slightly adjusting the experimental protocol (see Sections 29.5 and 29.6 for more details). This is particularly important to include populations who live away from university laboratories. Let us now turn to consider the deceptively intuitive notion of ‘culture.’ In a classic definition, the notion of culture encompasses those aspects of behaviour that are not genetically determined but are acquired through social learning (Cavalli-Sforza & Feldman, 1981). Culture is reflected in myths, legends, religion, body adornment, rules, daily routines, and use of production tools (Brown, 2004). In addition, every society has different forms of social organization: it has rules for social groups, age grading, family, kinship systems, play, division of labour, exchange, cooperation, and reciprocity (Brown, 2004). Contemporary anthropologists came to critically appraise DOI: 10.4324/9781003392972-33 458

Experimental methods to study cultural differences in linguistics

the notion of culture itself, viewed as ‘essentialized, localized, territorialized, bounded, highly shared, and symbolically coalesced’ (Hirschfeld, 2018, p. 233). Appadurai (1996) proposes instead a view of cultural spaces as fluid, heterogenous, fragmented, and constantly negotiated. Similarly, Claidiere and Sperber (2007) opt for the dynamic notion of ‘cultural representations.’ This means that researchers should not just aim to investigate differences between WEIRD and non-WEIRD populations but should also take into consideration diversity and variability within these populations (e.g., by including ethnically diverse and migrant populations, people who may not have a college degree or who come from economically marginalized communities).

29.2

Historical perspectives

The idea that cultural differences should take front stage in experimental psychology is not new. Boas (1930), a major figure in American anthropology, also criticized the exclusive focus of psychologists on the individual, abstracting away from the social environment. Yet, Arnett (2008) finds that 96% of participants in behavioural research publications from 2003 to 2007 were from WEIRD countries, although these represent roughly 12% of the world’s population. Follow-up studies by Nielsen et al. (2017) and Rad et al. (2018) report no substantial change. In addition, Rad et al. (2018) note that participant samples are frequently unidentifiable and when they are, they rely on undergraduates and online samples with no information about participant characteristics such as socioeconomic status, education, and ethnicity. Linguistics is equally concerned by these cultural biases. Psycholinguistics is a traditionally experimental subfield that studies how word meaning, sentence meaning, and discourse meaning are represented in the mind during speech production and comprehension. Anand et al. (2011) find that only ten languages represent 85% of psycholinguistics publications and conferences, i.e., English, German, Japanese, French, Dutch, Spanish, Mandarin, Korean, Finnish, and Italian. In child language acquisition, a subfield that has a tradition of relying on both experimental and observational data, Kidd and Garcia (2022) arrive to a similar conclusion: in the past 45 years, only 1.5% of the world’s 7,000 languages is represented by at least one article (N ­ = 103 languages), with English and other well-studied Indo-European languages representing the great majority of publications. To my knowledge, there are no studies on the representation of different languages in the various subfields of experimental linguistics, like experimental phonology, semantics, pragmatics, syntax, and morphology (Hemforth, 2013). Experimental phonetics, an interdisciplinary field of linguistics, physiology, and acoustics, most likely stands out for having the longest tradition of ­cross-linguistic ​­ research.

29.3

Critical issues and topics

Why is it important to study cultural differences in linguistics? First, we must acknowledge the significance of the empirical database issue. There is considerable language diversity, with variety at every level, from linguistic sounds, to grammar, to meaning, and lexicon that still needs to be described (Evans & Levinson, 2009). As a result, the search for linguistic universals has been repeatedly challenged by novel empirical evidence. For example, a proposed universal that was abandoned is ‘baby talk,’ the specific prosody that English, French, or German caregivers use to talk to infants. Cross-cultural and cross-linguistic studies demonstrate that baby talk is in fact a cultural convention (see Quiché Mayan in Bernstein-Ratner & Pye, 1984). Another abandoned universal is the Possible Word Constraint, according to which a single consonant or cluster of 459

Evangelia Adamou

consonants cannot form a word. El Aissati et al. (2012), however, found that this constraint does not apply in Tashelhit Berber, an Afroasiatic language that also allows clusters of consonants to be words. For more examples, see Evans and Levinson (2009) and Kidd and Garcia (2022). Second, the failure to find universals in such cultural and linguistic diversity should not come as a surprise if we admit that, although humans share a common biological basis, environment and experience determine which linguistic characteristics become conventionalized. Experiencedriven models try to capture the complex and adaptive nature of language: Language has a fundamentally social function. Processes of human interaction along with domain-general cognitive processes shape the structure and knowledge of language. Recent research in the cognitive sciences has demonstrated that patterns of use strongly affect how language is acquired, is used, and changes. These processes are not independent of one another but are facets of the same complex adaptive system (CAS). Language as a CAS involves the following key features: The system consists of multiple agents (the speakers in the speech community) interacting with one another. The system is adaptive; that is, speakers’ behaviour is based on their past interactions, and current and past interactions to get her feed forward into future behaviour. A speaker’s behaviour is the consequence of competing factors ranging from perceptual constraints to social motivations. The structures of language emerge from interrelated patterns of experience, social interaction, and cognitive mechanisms. (Five ­ Graces Group, 2009, pp. ­  ­1–2) ​­ In sum, if we want to better understand how the interaction environment and linguistic experience shape languages and cognition, we must move beyond student participant samples from the Global North and unidentifiable samples of participants recruited online (Blasi et al., submitted).

29.4

Current contributions and research

This section offers an overview of some studies that have been successfully conducted with diverse populations and summarizes their findings.

29.4.1

Experimental studies on language and cognition in diverse cultural settings

A survey of experimental work to study cultural differences and their effects on language can only start with reference to the controversial ‘linguistic relativity’ or ‘Sapir-Whorf hypothesis’ (Whorf, 1941). The linguistic relativity hypothesis holds that our language habits shape the way we think of the real world. Nowadays, a weak version of the linguistic relativity hypothesis is largely accepted, acknowledging that language allows for faster performance and better discrimination of the real world and that it enhances memorization. The degree of detail in the linguistic categorizations, in a given language, reflects the cultural interests and communicative needs of its speakers.

29.4.1.1

Quantity

Frank et al. (2008) study cognition of quantity among the Amazonian Pirahã, people who speak a language that does not have a special lexicon for number. The researchers performed two tasks, 460

Experimental methods to study cultural differences in linguistics

one of elicitation of numerals to confirm that the Pirahã do not use number language (N = 6 participants), and one matching task with real-life objects to test numerical cognition (N = 14 participants). The study finds that, despite the absence of number words, speakers of Pirahã perceive and match exact quantities. The study also finds that lack of number words affects accuracy in memorization of large numbers of objects.

29.4.1.2

Colour

Gibson et al. (2017) worked with the Tsimane’, a hunter-gatherer Amazonian population from Bolivia (N = 28–58 participants in the various tasks). They conducted various colour-naming tasks using chips and picture stimuli, as well as a memory colour task. They then compared the results to two control groups, a group of Bolivian Spanish speakers from the area (N = 20–25 participants) and a group of English speakers in the US (N = 29–30 participants). Analysis of the results suggests that the Tsimane’ respond to their communicative needs for natural objects through an elaborate botanical vocabulary, not through colour names. However, the study also finds that introduction of coloured artefacts boosts the relevance of colour since their colour is not predictable. The number of colour names is, therefore, likely to increase in industrial societies when speakers are confronted to new communicational needs that require colour disambiguation.

29.4.1.3

Space

A standard task to test memorization strategies of spatial relations in the field is ‘Animals in a Row’ (Levinson, 2003). Participants in this task are asked to memorize the placement of three, small, toy animal figures positioned in a row in front of them. They are asked to recall this initial placement and reconstruct it a few metres away after a 180° rotation. Overall, responses to the Animals in a Row task provide results confirming the linguistic relativity hypothesis, that is, that language habits predict the preferred memorization of spatial relations. However, Meakins et al. (2016) report that the Gurindji participants in Australia gave a majority of geocentric responses to the task (i.e., memorizing the animal figures with respect to a cardinal point) whether they spoke Gurindji, a Pama-Nyungan language with an elaborate system of cardinal terms (N = 30 participants), or Gurindji Kriol, a mixed language resulting from an English-lexified creole and Gurindji that has not retained the Gurindji cardinal term system (N = 77 participants). Calderón et al. (2019) follow up on this observation by investigating an Indigenous population of Mexico, the Ngiguas. In the small rural community where the study was conducted, only some older adults still speak Ngigua (Otomanguean), while the younger generations speak only Spanish. Seventeen Spanish monolinguals and 17 Ngigua-Spanish bilinguals responded to a localization task to allow for the study of both speech and co-speech gesture. Analysis of the results shows that Spanish monolinguals use geocentric co-speech gestures to support geocentric representations of space, as well as Spanish cardinal terms in an innovative manner. This confirms that transmission of traditional ways of representing space is possible in small, rural communities even when the Indigenous language is no longer spoken.

29.4.2

Experimental studies among diverse bilingual populations

Bilingualism is another subfield where the study of cultural differences is necessary to understand human language and cognition. Bilingualism is a dynamic and adaptive neurocognitive phenomenon across individuals and across the lifespan of a single individual (Pliatsikas, 2019). It follows 461

Evangelia Adamou

that if we only examine Western, middle-class, university-educated bilingual populations, we are missing out on the huge variety of bilingual experiences (Adamou, 2021).

29.4.2.1

Language switching costs

For example, in the psycholinguistic literature, there is ample experimental evidence indicating slower reaction times when participants are asked to name pictures by alternating from one language to another. These are known as ‘language switching costs’ related to lexical access. However, researchers increasingly show that language switching costs can be reduced or even disappear when the experimental tasks are closer to participants’ real-life experience with codeswitching (see Gullifer et al., 2013 for a study with Spanish-English bilinguals from the US). In line with this work, Adamou and Shen (2019) conducted the first experimental study of sentence processing in a typologically rare form of codeswitching used by Romani (Indic)Turkish (Altaic) simultaneous bilinguals from Greece. This Romani-Turkish variety shows an interesting split in morphology: Turkish nouns inflect in Romani for case, number, and gender, like other Romani nouns do, but Turkish verbs systematically combine with the Turkish person, tense-aspect-modality morphology, and valency morphemes, contrasting with Romani verbs that combine with Romani verbal morphology. In summary, the findings from a picture-matching and a word-monitoring experiment show that Romani-Turkish bilinguals anticipate codeswitching based on prior experience (N = 37 and 49 participants). More specifically, when the input aligns with their expectations, there are no processing costs. This study paves the way to experimental studies with speakers of the few full-fledged mixed languages that have been identified in the world. Mixed languages result from the systematic combination of two languages and arise under specific social circumstances where a community of speakers forges a mixed cultural identity.

29.4.2.2

Priming

Priming refers to the observation that processing one stimulus (the prime) unconsciously affects the processing of a subsequent stimulus (the target). This mechanism is well documented in language, including for ‘structural priming’ where the activation of one structure affects the production and the comprehension of a subsequent structure. A meta-analytic study also shows that structural priming operates among monolinguals and, to a lesser extent, among bilinguals (Mahowald et al., 2016). In theory, cross-language structural priming among bilinguals could be a mechanism that drives contact-induced change in the long run (Loebell & Bock, 2003). However, such ongoing change is not easily observed in languages like English, with a strong written tradition and with prescriptive norms that are strengthened by formal education. Kootstra and Şahin (2018) are the first to experimentally demonstrate the relevance of crosslanguage priming in structural change in a bilingual population speaking Papiamento, a Spanish and Portuguese-based Creole, and Dutch (Germanic). The researchers tested priming in dative structures among Papiamento-Dutch bilinguals living in the Netherlands (N = 37 participants) and in Aruba (N = 25 participants). They found that Papiamento speakers who are in daily contact with Dutch (in the Netherlands) exhibit higher levels of priming by Dutch dative structures than those who have less intense contact with Dutch (in Aruba). Similarly, Adamou et al. (2021) tested the role of cross-language priming in adjective (ADJ)noun (N) order among 90 bilinguals of Romani (Indic) and Romanian (Romance) from Romania. The study reveals significant cross-language priming effects, whereby bilinguals favour the use of N-ADJ order in Romani immediately following a N-ADJ sentence read in Romanian whereas the 462

Experimental methods to study cultural differences in linguistics

inherited ADJ-N Romani order only benefits from priming when speakers read a Romani sentence with an ADJ-N ­ ​­ order.

29.4.3

Experimental studies on typologically rare linguistic phenomena in diverse cultural settings

Typologically rare linguistic phenomena offer a unique opportunity to test theoretical claims and expand our knowledge of what is possible in human language.

29.4.3.1

Relative clauses

In a classic article, Keenan and Comrie (1977) propose that, cross-linguistically, subjects are more accessible to relativization than direct objects, indirect objects, and oblique objects; this preference is known as the accessibility hierarchy. Since then, there has been a large body of work with typologically diverse languages confirming the original explanation that the accessibility hierarchy ‘directly reflects the psychological ease of comprehension’ (Keenan & Comrie, 1977, p. 88). More recently, a series of experimental studies was conducted in languages that use fully ambiguous relative clauses (RCs). The rationale is that these ambiguous RCs can offer an unconfounded result for universal subject preference and an experiment has the advantage of neutralizing any disambiguating semantic and pragmatic cues that can be otherwise found in natural conversations. The first experimental study tested subject preference in two ergative Mayan languages, Ch’ol and Q’anjob’al, using a picture-matching comprehension experiment (Clemens et al., 2015). Sixty-three Ch’ol and 100 Q’anjob’al speakers participated in the study in Mexico and Guatemala, respectively. The results for the ambiguous RCs in both languages confirm universal subject preference favouring an ergative-subject interpretation (68% in Ch’ol and 74% in Q’anjob’al). The study also shows that participants were faster in the subject responses than in the object ones offering support to the ease of comprehension explanation. Another experimental study tested subject preference in ambiguous RCs for both postnominal and prenominal RCs in Chamorro, an Austronesian language (Borja et al., 2016). Results from 135 participants show object preference in prenominal RCs, but subject preference for postnominal RCs. The study also finds a subject-processing advantage in both prenominal and postnominal RCs. Adamou (2017) adapted this experiment for Ixcatec, a critically endangered Otomanguean language of Mexico with less than ten speakers. The analysis shows that 63% of the Ixcatec ambiguous RCs are interpreted as subject RCs (N = 7 participants). Results from reaction times show that subject RC interpretations are numerically faster than object RC interpretations, although this difference does not reach significance. This lack of significance may be due to the small size of the sample, or it could suggest that Ixcatec comprehenders do not go through an initial stage of subject interpretation before proceeding to an object interpretation in agreement with spoken corpus preferences where transitive subject RCs are as frequent as object RCs.

29.4.3.2

Nominal tense

Nominal tense is a linguistic phenomenon that has recently been reported for a small number of languages across the world to refer to the use of grammatical morphology on argument nominals with an independent temporal interpretation from that of the clause (Nordlinger & Sadler, 2004). This is a highly controversial phenomenon in that it challenges our view of nouns as time stable 463

Evangelia Adamou

and, specifically, our experience with languages that encode tense only through verbal but not nominal morphology. Descriptive work identified that deictic suffixes carry temporal information on their own in Pomak, a non-standardized Slavic variety spoken in Greece. But, in practice, what does it mean for speakers to have nominal tense in their language? In Adamou and Haendler (2020), an experimental study using a response-time experiment with 40 Pomak participants demonstrates for the first time that speakers of a language with nominal tense can decide whether a noun phrase is past or future without any additional information from an adverb or a verb and in the absence of pragmatic and semantic cues. Beyond the relevance of the results for Pomak, this study introduces an experimental method that can easily be used to test the existence of nominal tense in other languages.

29.5

Main research methods

Experiments are quantitative methods. Typically, to obtain sufficient statistical power, the number of participants and experimental trials is high. However, Navarro-Torres et al. (2021) stress that rich characterization of the sample may be as or even more important than a poorly characterized large sample. For example, when considering the impact of bilingualism on cognition, a binary characterization ‘bilingual’ versus ‘monolingual’ is not informative enough since we know that fine-grained distinctions in age of acquisition, proficiency, daily use, and frequency of codeswitching are moderating factors. In an experiment, scientists identify a ‘dependent variable,’ something that is the focus of the study, and test whether it is affected by an ‘independent variable.’ For example, the dependent variable can be the reaction times of participants in a response task and the independent variable the position of the button on the right or left. At present, complex statistical models allow for the examination of more variables as well as their interactions. Standard experimental techniques include the use of ‘control’ or ‘comparison’ groups that differ on the variable that is hypothesized to be the main cause for an observed behaviour (e.g., when studying bilingual populations, speaking languages A and B, it is common to include a monolingual control group who only speaks language A and then compare responses in language A for the two groups); randomization of trials, so that the order of appearance of the stimuli would not influence the results; inclusion of stimuli as ‘fillers,’ that is, items that the experimenter is not interested in and that are used in order to prevent participants from identifying the goal of the study (note that researchers sometimes analyse responses to fillers to see if the participants performed the task as expected); and ‘norming’ (i.e., rating) of the stimuli prior to the experiment by non-participants to make sure they are as well-formed as the experimenter thought they would be. Some experimental designs that have been successfully conducted in the field are judgment tasks, picture-matching tasks, priming tasks, and several semi-experimental tasks, including picture naming and speech production tasks.

29.5.1

Judgment tasks

Acceptability judgment tasks are a reliable tool in linguistic research (Sprouse et al., 2013), particularly, when working with a large sample of participants (Gibson & Fedorenko, 2013). Participants in these tasks judge the acceptability or naturalness of linguistic stimuli using a five-point or a seven-point scale (small numbers are for negative judgments and high numbers for positive judgments). To facilitate the memorization of the scale for participants who have little experience with computers or this kind of formal testing, it is possible to use stickers on the keyboard: a 464

Experimental methods to study cultural differences in linguistics

smiley face on the key for the best score and a frown for the worst. To make sure participants understand the task and are allowed to negatively evaluate some stimuli, practice sentences can serve as anchors for the highest and lowest points in the scale with feedback by the experimenter prior to the experiment. Moreover, instead of asking the participants whether the stimuli are ‘correct’ or ‘acceptable,’ experimenters can ask whether they are ‘natural’ in the community (e.g., Adamou & Haendler, 2020).

29.5.2 ­Picture-matching ​­ tasks Picture-matching experiments are inspired by the visual world paradigm which was developed by Dahan and Tanenhaus (2004) for eye tracking. In picture-matching experiments, participants view a pair of pictures and are asked to match one of the pictures with the content of a spoken sentence that they hear. The pictures can be drawings (e.g., Borja et al., 2016; Clemens et al., 2015), pictures found on the Internet (e.g., Adamou & Shen, 2019), or photographs of people, objects, and situations that the community is familiar with and that are specifically taken for the study (e.g., Adamou, 2017; Calderón et al., 2019). The task is generally presented on a laptop. However, to make sure that the older adult speakers of Ixcatec (in their 80s) would be comfortable enough to perform the task, Adamou (2017) presented the stimuli printed in A4 size. The stimuli were manually randomized through reshuffling in the beginning of each session. The sessions were filmed, and reaction times were measured from the onset of the audio stimuli to the moment where participants pointed to the picture. Later, Calderón et al. (2019) conducted this experiment with older adult Ngigua speakers and found that it can also work well on a laptop. Lack of familiarity with a computer does not seem to be an impediment when the instructions are clear (also see Adamou & Shen, 2019 for a similar experience).

29.5.3

Priming tasks

A typical design for a priming production experiment consists of a sentence trial and a picture trial. Participants first read or listen to a sentence (sentence trial) and then describe a picture (picture trial). A priming effect is observed when the linguistic structure chosen in the picture trial follows the structure of the sentence trial (e.g., Kootstra & Şahin, 2018; Adamou et al., 2021). Word monitoring is a different kind of priming experiment. This is a task where participants listen to a sentence and are asked to press a button as soon as they hear a specific word (target). The time it takes to press the button for the target provides information about the ease of processing of the preceding word (prime). To make sure that participants are not just searching for the target word without processing the sentence, a comprehension question can follow each trial (e.g., Adamou & Shen, 2019).

29.5.4 ­Semi-experimental ​­ tasks In addition to the experimental tasks listed above, there are several semi-experimental tasks that can be useful. Semi-experimental tasks are different from experimental tasks in that they are not always structured around dependent and independent variables, trials are not randomized, and there are no fillers and control groups. For example, verbal fluency tasks allow researchers to assess a bilingual speaker’s language dominance at the moment of the study. In the semantic version of the task, speakers are asked to produce in one minute as many words as possible belonging to a specific semantic category (e.g., 465

Evangelia Adamou

body parts, animals, fruits, and vegetables) in one language and then in the other (see Calderón et al., 2019). The language in which the greatest number of words is produced is the dominant language at the moment of the study. In the phonological version of the task, speakers need to produce as many words as possible beginning with specific letters or sounds. However, this version should not be used with participants with low literacy levels as it is known to be a good indicator of illiteracy (in alphabetic scripts) and is modulated by the level of education (Petersson et al., 2000). Other popular semi-experimental tasks are naming tasks in which speakers are asked to name real-life objects or objects depicted in drawings or illustrated in pictures, for example, to study colour names (Gibson et al., 2017). In an information structure phonetic study in Ixcatec, real-life objects were shown in a first session (e.g., fruits); photographs of these objects were shown in a second session. This combination allowed for enough repetitions while avoiding a habituation process that could interfere with information structure status (Adamou et al., 2018). More generally, speech production tasks are semi-experimental. For example, to investigate spatial language and cognition, participants were asked to provide a description of the localization of two buildings while being filmed (Calderón et al., 2019). This allows to analyse both speech and co-speech gesture for all participants while controlling their position with respect to cardinal points to differentiate between spatial representations that are geocentric (based on cardinal points) and egocentric (based on the speaker ’s viewpoint).

29.6

Recommendations for practice

This section offers a brief practical guide to build and conduct an experiment in diverse cultural and linguistic settings following eight steps.

29.6.1

Step 1: Identify a research question

The point of departure in an experimental study should always be the identification and formulation of a clear research question. For researchers who are community outsiders, familiarity with the community and the language under study are essential to this endeavour as to not directly transpose research questions from well-described languages and populations to lesser-described ones, and to identify the most relevant research questions. In line with the community-based research frame, researchers should recruit research assistants from the community who will actively participate in the elaboration of the experimental design as well as its implementation. Appropriate credit and compensation are expected, including offering co-authorship in scientific papers. Reminder: as in any workplace, maintaining respectful and ethical professional relationships in the field is paramount. When applicable, it is possible to integrate the research study in a local association’s activities and negotiate its goals with the association’s representatives and members. Researchers who are members of the community are also concerned by the ways to best communicate and prepare their research albeit in a different way (Cruz-Cruz, 2020).

29.6.2

Step 2: Clearly formulate your research question and predictions

In experimental work, it is important to start by formulating clear research questions and predictions. This is always done before conducting the experiments. Preregistrations of the experimental protocol, including information about the hypothesis, the data that will be collected, and their analysis are possible using the Open Science Framework (OSF) (https://osf.io/).

466

Experimental methods to study cultural differences in linguistics

29.6.3

Step 3: Choose your experimental design

A great number of experimental paradigms are available in the literature. In practice, some leeway is allowed in adapting these designs to the field (see Section 29.5 for some examples). Most research institutions require that research protocols be approved by research ethics committees before implementation. Research involving vulnerable groups such as minorities and groups who are economically disadvantaged are treated with particular caution. The Statement of Ethics of the American Anthropological Association is a good starting point for a general reflection on ethical issues (see ­ http://ethics.americananthro.org/ethics-statement-0-preamble/), ­ ­­ ­​­­ ­​­­ ​­ but also see Pérez González (2021) for an Indigenous academic perspective and Gaby and Woods (2020) for a discussion of how linguists should relate to Indigenous peoples and their languages.

29.6.4

Step 4: Prepare your stimuli and build your experiment

In psycholinguistic experiments, many tasks are built around written stimuli as they present several advantages (e.g., no confound by differences in prosody; better control of the length of the stimuli). However, aural stimuli are better adapted to the study of less-described languages as these are rarely written and taught at school. When recording the stimuli, it is important to make sure not to introduce artefacts due to dialectal differences, and therefore work with speakers from within the community. A good idea is to conduct a ‘norming study’ prior to the experiment. In a norming study, participants not involved in the experiment rate the various stimuli on a five- or seven-point scale for well-formedness (as in a judgment task; see above). Researchers should eliminate the stimuli that receive a mean rating below a given threshold (e.g., three in a five-point scale). Length of the stimuli should be controlled for to ensure comparability and eventually allow for calculating reaction times. Fillers need to be used to distract the attention of participants from the main research question. Finally, stimuli should be randomized for each participant. Regarding visual stimuli, it is best to use coloured photographs rather than black-and-white line drawings given that little-educated participants have difficulties recognizing the latter (Reis et al. 2006). One can either use existing visual stimuli or create their own, culturally adapted stimuli (see Adamou, 2017; Adamou et al., 2018; Borja et al. 2016 Calderón et al., 2019). To build an experiment, a free open-source experiment builder is Open Sesame (Mathôt et al., 2012).

29.6.5

Step 5: Select your participants

Before conducting experiments in the field, it is important to have had the opportunity to conduct extensive participant observation to comprehend the sociolinguistic background and decide who could take part in the study (age range, languages spoken, etc.). At the beginning of the experimental session, experimenters can conduct a short sociolinguistic interview to assess education, literacy, dialectal variation, and multilingualism among the participants. Prior to their participation in the experiment, participants are asked to sign a consent form or simply record their approval. This is when researchers can explain what participants will be asked to do during the experiment; that they can leave before the end of the experiment if they feel uncomfortable; why the researchers are conducting this study; what are the benefits for the community; whether participants will receive compensation and who funds the research; where the collected data will be stored and published; how the researchers will ensure respect for the privacy and confidentiality of participants’ personal information. In addition to individual consent,

467

Evangelia Adamou

researchers often need to obtain community consent through the appropriate institutions (e.g., the community’s general assembly) as well as national and/or local authorities. In psycholinguistic research, it is standard practice to compensate participants for their time, but in the field, it is important to first understand cultural norms regarding financial compensation. Also keep in mind that it is not ethically acceptable to exercise pressure or allure potential low-income participants by exclusively offering financial incentives. Alternatives to individual financial compensation may be a small gift to participants or a financial contribution to a local association.

29.6.6

Step 6: Collect your data

Always allow extra time for data collection in the field as participants may be busy with everyday life tasks and not be available for the experimental study as planned. Even if you cannot conduct the experiment in the lab, try to conduct it in a similar location for all the participants: a quiet room in the house of participants, a classroom at the local school, a room in a collective structure that is accessible to all the participants. Unlike standard practice in most lab research, where the presence of the researcher during the experiments is not required, in the field it is best that the researcher be present to make sure that everything goes according to plan. Moreover, the researchers who will conduct the experiment in the field must be familiar with the computer programme (e.g., Open Sesame) that they will use to check at the beginning of every session if it is recording the data properly and adjust as needed.

29.6.7

Step 7: Analyse your data

Experimental linguists typically learn to use statistical methods for data analysis. This is an important investment in time and constant training is needed to follow developments in the field of statistics. Alternatively, collaborating with a researcher who is familiar with statistics is an excellent way to ensure ­state-of-the-art ­​­­ ­​­­ ​­ analyses.

29.6.8

Step 8: Interpret your results

Once the statistical analyses are available, researchers can confront the results with the predictions formulated at the beginning of the research project.

29.7

Future directions

Experimental linguistics is a fast-growing field. But will cross-linguistic and cross-cultural representation follow this development?

29.7.1

Challenges

There are several challenges to the use of experimental methods outside the lab. First, practical reasons are at play. As with any research conducted in the field, impediments include accessibility to remote locations, access to electricity, and adverse climate conditions (Hellwig, 2019). Other practical reasons may include difficulties to transfer high-quality portable equipment to the field. However, these problems are increasingly being solved thanks to recent technological progress allowing the use of laptops, cameras, recorders, and mobile eye-trackers even in isolated places. 468

Experimental methods to study cultural differences in linguistics

In addition, with travel restrictions due to the Covid-19 pandemic (starting in 2020), many researchers turned to web-based research, a special type of field research that allows reaching large and heterogeneous populations using smartphones, personal computers, and the Internet (Boase & Humphreys, 2018; Dufau et al., 2011). Prior experience shows that response times are accurately measured when they are recorded locally and then shared via the web (Enochson & Culbertson, 2015). Second, linguists who work in diverse linguistic and cultural settings, and language communities themselves, may have different research priorities. For example, when studying an undescribed language, the primary goal is to produce a grammatical sketch and a dictionary. When working on an endangered language, the priority is language documentation and revitalization. Experimental work can, therefore, be introduced once primary goals are met. Experimental methods are not necessarily antagonistic with these goals since they allow researchers to work with a great number of participants in a given community, whether these are speakers of the language under study or not (in the latter case they can collaborate as part of a control or comparison group; Adamou, 2021). Finally, a major hurdle is that experiments are designed for Western, university-educated participants and that there is still little know-how to design experiments that can work for socially and culturally diverse populations. For example, Sauerland (2018) shares his experience with the Amazonian Pirahã community. During a weeklong stay, he recruited 16 speakers to respond to a truth value judgment task that aimed to test syntactic recursion following a heated debate about whether Pirahã exhibited recursion or not. Pirahã participants were asked to listen to the prerecorded statements of a Pirahã speaker, e.g., ‘I have been to the moon,’ reported accurately or inaccurately by a second Pirahã speaker, e.g., ‘X said ‘I have been to the sun.’ Participants in the experiment had to decide whether the second speaker heard well or said the truth. The experiment did not work as expected and the author admits that one reason might be that the speakers who recorded the utterances were well-respected senior members of the community and that it may have been culturally and socially inappropriate for the participants in the study to challenge what the elders said by qualifying their statements as ‘false.’ Another interesting example of both a failed and a successful experiment is related in Mulak et al. (2021). A team of researchers, including researchers who were familiar with the community, conducted a word-learning experiment with 34 Nungon speakers from Papua New Guinea. Whereas the first experiment yielded null results, the second experiment was successful after adjusting the instructions to make them clearer, reducing the cognitive load of the task, and framing the study within a real-world context.

29.7.2 Ways forward As a community, we can support diversity in experimental linguistics by explicitly encouraging the submission of papers from under-described languages in conferences and following through on the publication of this research in international journals. As individual researchers, we can change our practices, even when we do not work in a cross-cultural perspective, to include a thoughtful characterization of our sample and carefully discuss its potential for generalization. Last but certainly not least, we must keep in mind that cross-cultural and cross-linguistic experimental research raises specific ethical issues. Whereas researchers in most lab-based experiments are not familiar with their participants, familiarity with the communities that participate in experimental work in the field should be the starting point. Otherwise, we run the risk of turning experimental research in the field into an extractive kind of science. As in other research fields, intense collaboration with the various communities and reciprocity should be the guiding principles in this endeavour. 469

Evangelia Adamou

Further reading Adamou, E. (2021). The adaptive bilingual mind: Insights from endangered languages. Cambridge University Press. Gillioz, C., & Zufferey, S. (2020). Introduction to experimental linguistics. John Wiley & Sons. Majid, A. (2021). Olfactory language requires an integrative and interdisciplinary approach. Trends in Cogni­ ­ ­ tive Sciences. https://doi.org/10.1016/j.tics.2021.03.004 Meakins, F., Green, J., & Turpin, M. (2018). Understanding linguistic fieldwork. London: Routledge. Rappaport, J. (2020). Cowards don’t make history: Orlando Fals Borda and the origins of participatory action research. Duke University Press.

Related topics Experimental research in cross-linguistic psycholinguistics; new directions in statistical analysis for experimental linguistics; experimental methods to study child language, experimental methods to study bilinguals

References Adamou, E. (2017). Subject preference in Ixcatec relative clauses (Otomanguean, Mexico). Studies in Language, 41, ­872–913. https://doi.org/10.1075/sl.16055.ada ​­ ­ ­ ­ Adamou, E. (2021). The adaptive bilingual mind: Insights from endangered languages. Cambridge University Press. Adamou, E., Feltgen, Q., & Padure, C. (2021). A unified approach to the study of language contact: Crosslanguage priming and change in adjective/noun order. International Journal of Bilingualism, 25(6), ­ 1635– ­ ​ 1654. https://doi.org/10.1177/13670069211033909 ­ ­ ­ ­ Adamou, E., Gordon, M., & Gries, S. T. (2018). Prosodic and morphological focus marking in Ixcatec (Otomanguean). In E. Adamou, K. Haude & M. Vanhove (Eds.), Information structure in lesser-described ­ ​­ languages: Studies in prosody and syntax (pp. 51–83). John Benjamins Publishing Company. https://doi. org/10.1075/slcs.199.03ada ­ ­ Adamou, E., & Haendler, Y. (2020). An experimental approach to nominal tense: Evidence from Pomak (Slavic), Language 96(3), ­ ­ 507–50. ­ ​­ Adamou, E., & Shen, X. R. (2019). There are no language switching costs when codeswitching is frequent. International Journal of Bilingualism, 23, ­53–70. https://doi.org/10.1177/1367006917709094 ​­ ­ ­ ­ Anand, P., Chung, S., & Wagers, M. (2011). Widening the net: Challenges for gathering linguistic data in the digital age. National science foundation SBE 2020 planning activity. https://people.ucsc.edu/~mwagers/ ­ ­ papers/WideningtheNet.AnandChungWagers.pdf (Accessed 24 September 2021). Appadurai, A. (1996). Modernity at large: Cultural dimensions of globalisation. University of Minnesota ­ Press. Arnett, J. J. (2008). The neglected 95%: Why American psychology needs to become less American. The American Psychologist, 63(7), ­ ­602–14. ​­ Bernstein-Ratner, N., & Pye, C. (1984). Higher pitch in BT is not universal: Acoustic evidence from Quiché Mayan. Journal of Child Language, 11, ­515–22. doi.org/10.1017/S0305000900005924 ​­ ­ ­ Blasi, D. E., Henrich, J., Adamou, E., Kemerer, D., & Majid, A. (submitted). Over-reliance on English hinders cognitive science. Trends in Cognitive Sciences, 26(12), ­ ­1153–1170. ​­ Boas, F. (1930). Some problems of methodology in social sciences. In L. D. White (Ed.), The new social science (pp. 260–269). University of Chicago Press. Boase, J., & Humphreys, L. (2018). Mobile methods: Explorations, innovations, and reflections. Mobile Media & Communication 6(2), https://doi.org/10.1177/2050157918764215 ­ ­153–62. ​­ ­ ­ ­ Borja, M. F., Chung, S., & Wagers, M. (2016). Constituent order and parser control processes in Chamorro. In A. Camp, Y. Otsuka, C. Stabile & N. Tanaka (Eds.), Proceedings of the 21st annual meeting of the Austronesian formal linguistics association (pp. 15–32). Asia-Pacific Linguistics. Boyd, N. M., & Bogen, J. (2021). Theory and observation in science. In E. N. Zalta (Ed.), The Stanford encyclopedia of philosophy, https://plato.stanford.edu/archives/fall2021/entries/science-theory-observation/. ­ ­ ­ ­ ­­ ­​­­ ​­ Brown, D. E. (2004). Human universals, human nature and human culture. Daedalus 133, 47–54. ­ ​­ https://doi. ­ org/10.1162/0011526042365645 ­ ­

470

Experimental methods to study cultural differences in linguistics Calderón, E., De Pascale, S., & Adamou, E. (2019). How to speak “geocentric” in an “egocentric” language: A multimodal study among Ngigua -Spanish bilinguals and Spanish monolinguals in a rural community of Mexico. Language Sciences, 74, ­24–46. ​­ https://doi.org/10.1016/j.langsci.2019.04.001 ­ ­ ­ Cavalli-Sforza, L. L., & Feldman, M. W. (1981). Cultural transmission and evolution: A quantitative approach. Princeton University Press. Claidiere, N., & Sperber, D. (2007). The role of attraction in cultural evolution. Journal of Cognition and ​­ Culture, 7, ­89–111. Clemens, L. E., Coon, J., Mateo Pedro, P., Morgan, A. M., Polinsky, M., Tandet, G., & Wagers, M. (2015). Ergativity and the complexity of extraction: A view from Mayan. Natural Language and Linguistic Theory, 33(2), ­ ­417–69. ​­ Cruz-Cruz, E. (Ed.). (2020). Theoretical reflections around the role of fieldwork in linguistics and linguistic anthropology: Contributions of Indigenous researchers from southern Mexico. Language Documentation and Conservation special publication, 22–23. http://nflrc.hawaii.edu/ldc/sp23/ Dahan, D., & Tanenhaus, M. K. (2004). Continuous mapping from sound to meaning in spoken-language comprehension: Immediate effects of verb-based thematic constraints. Journal of Experimental Psychology. Learning, Memory, and Cognition, 30, ­498–513. https://doi.org/10.1037/0278-7393.30.2.498 ​­ ­ ­ ­­ ​­ Dufau, S., Duñabeitia, J. A., Moret-Tatay, C., McGonigal, A., Peeters, D., Alario, F.-X., et al. (2011). Smart phone, smart science: How the use of smartphones can revolutionize research in cognitive science. PLoS ONE, 6(9), ­ e24974. https://doi.org/10.1371/journal.pone.0024974 ­ ­ ­ El Aissati, A., McQueen, J. M., & Cutler, A. (2012). Finding words in a language that allows words without ​­ vowels. Cognition, 124, ­79–84. Enochson, K., & Culbertson, J. (2015). Collecting psycholinguistic response time data using Amazon Mechanical Turk. PLoS ONE, 10(3), ­ e0116946. Evans, N., & Levinson, S. C. (2009). The myth of language universals: Language diversity and its im​­ ­ ­ ­ portance for cognitive science. Behavioral and Brain Sciences, 32, ­429–48. https://doi.org/10.1017/ s0140525x0999094x Five Graces Group, Beckner, C., Blythe, R., Bybee, J., Christiansen, M. H., Croft, W., Ellis, N. C., Holland, J., Ke, J., Larsen-Freeman, D., & Schoenemann, T. (2009). Language is a complex adaptive system: Position paper. Language Learning, 59, ­1–26. ​­ https://doi.org/10.1111/j.1467-9922.2009.00533.x ­ ­ ­ ­ ​­ Frank, M. C., Everett, D. L., Fedorenko, E., & Gibson, E. (2008). Number as a cognitive technology: Evidence from Pirahã language and cognition. Cognition, 108, ­819–824. ​­ https://doi.org/10.1016/j.cognition.2008. ­ ­ ­ 04.007 Gaby, A., & Woods, L. (2020). Toward linguistic justice for Indigenous people: A response to Charity Hudley, ­ ­e268–80. ​­ ­ ­ Mallinson, and Bucholtz. Language, 96(4), https://doi:10.1353/lan.2020.0078. Gibson, E., & Fedorenko, E. (2013). The need for quantitative methods in syntax and semantics research. Language and Cognitive Processes, 28, ­88–124. ​­ https://doi.org/10.1080/01690965.2010.515080 ­ ­ ­ Gibson, E., Futrell, R., Jara-Ettinger, J., Mahowald, K., Bergen, L., Ratnasingam, S., Gibson, M., Piantadosi, S. T., & Conway, B. R. (2017). Color naming across languages reflects color use. Proceedings of ­ ­ ​­ ­ the National Academy of Sciences of the United States of America, 114(40), 10785–10790. https://doi. org/10.1073/pnas.1619666114 ­ ­ Gullifer, J., Kroll, J. F., & Dussias, P. E. (2013). When language switching has no apparent cost: Lexical access in sentence context. Frontiers in Psychology, 4, ­1–13. ​­ Hellwig, B. (2019). Linguistic diversity, language documentation and psycholinguistics: The role of stimuli. Language Documentation and Conservation, 16. http://hdl.handle.net/10125/24855 ­ ­ ­ Hemforth, B. (2013). Experimental linguistics. In Oxford bibliographies online in linguistics (pp. ­­  ­1–16). ​­ https://doi.org/10.1093/obo/9780199772810-0112. ­ ­ ­ ­­ ​­ Henrich, J., Heine, S., & Norenzayan, A. (2010). The weirdest people in the world? Behavioral and Brain Science, 33(2–3), https://doi.org/10.1017/S0140525X0999152X ­­ ​­ ­61–83. ​­ ­ ­ ­ Hirschfeld, L. (2018). The Rutherford atom of culture. Journal of Cognition and Culture, 18, ­231–61. ​­ Keenan, E. L., & Comrie, B. (1977). Noun phrase accessibility and universal grammar. Linguistic Inquiry, 8, ­63–99. ​­ Kidd, E., & Garcia, R. (2022). How diverse is child language acquisition? First Language online first https:// doi.org/10.1177/01427237211066405 ­ ­ Kootstra, G. J., & Şahin, H. (2018). Crosslinguistic structural priming as a mechanism of contact-induced language change: Evidence from Papiamento-Dutch bilinguals in Aruba and the Netherlands. Language, 94, ­902–30. ​­ https://doi.org/10.1353/lan.2018.0050 ­ ­ ­

471

Evangelia Adamou Levinson, S. C. (2003). Space in language and cognition. Cambridge University Press. Loebell, H., & Bock, K. (2003). Structural priming across languages. Linguistics, 41, ­791–824. https://doi. ​­ ­ org/10.1515/ling.2003.026 ­ ­ Mahowald, K., James, A., Futrell, R., & Gibson, E. (2016). A meta-analysis of syntactic priming in language production. Journal of Memory and Language, 91, ­5–27. https://doi.org/10.1016/j.jml.2016.03.009 ​­ ­ ­ ­ Mathôt, S., Schreij, D., & Theeuwes, J. (2012). OpenSesame: An open-source, graphical experiment builder for the social sciences. Behavior Research Methods, 44, 314–324. https://doi.org/10.3758/s13428-011-0168-7 ­ ​­ ­ ­ ­­ ­​­­ ­​­­ ​­ Meakins, F., Jones, C., & Algy, C. (2016). Bilingualism, language shift and the corresponding expansion of spatial cognitive systems. Language Sciences, 54, ­1–13. https://doi.org/10.1016/j.langsci.2015.06.002 ​­ ­ ­ ­ Mulak, K. E., Sarvasy, H. S., Tuninetti, A., & Escudero, P. (2021). Word learning in the field: Adapting a laboratory-based task for testing in remote Papua New Guinea. PLoS ONE, 16, e0257393. https://doi. org/10.1371/journal.pone.0257393 ­ ­ Navarro-Torres, C. A. Beatty-Martínez, A. L., Kroll, J. F., & Green, D. W. (2021). Research on bilingualism as discovery science. Brain and Language, 222, 105014. Nielsen, M., Haun, D., Kärtner, J., & Legare, C. H. (2017). The persistent sampling bias in developmental psychology: A call to action. Journal of Experimental Child Psychology, 162, ­31–8. ​­ https://doi.org/ ­ ­ 10.1016/j.jecp.2017.04.017. ­ Nordlinger, R., & Sadler, L. (2004). Nominal tense in a crosslinguistic perspective. Language, 80, ­776–806. ​­ Pérez González, J. (2021). The ethical principles of linguistic field work methodologies–According to whom? [Translated from Spanish]. In E. Cruz Cruz (Ed.), Theoretical reflections around the role of fieldwork in linguistics and linguistic anthropology: Contributions of indigenous researchers from southern Mexico. Language documentation & conservation (pp. http://hdl.handle.net/10125/24988 ­­  ­131–52). ​­ ­ ­ ­ Petersson, K. M., Reis, A., Askelöf, S., Castro-Caldas, A., & Ingvar, M. (2000). Language processing modulated by literacy: A network analysis of verbal repetition in literate and illiterate subjects. Journal of Cognitive Neuroscience, 12, ­364–382. ​­ https://doi.org/10.1162/089892900562147 ­ ­ ­ Pliatsikas, C. (2019). Understanding structural plasticity in the bilingual brain: The dynamic restructuring ​­ ­ ­ ­ model. Bilingualism: Language and Cognition, 6, ­1–13. https://doi.org/10.1017/S1366728919000130 Rad, M. S., Martingano, A. J., & Ginges, J. (2018). Toward a psychology of Homo sapiens: Making psychological science more representative of the human population. Proceedings of the National Academy of Sciences, 115(45), ­ ­11401–11405. ​­ https://doi.org/10.1073/pnas.1721165115 ­ ­ ­ Reis, A., Faísca, L., Ingvar, M., & Petersson, K. M. (2006). Color makes a difference: Two-dimensional object naming in literate and illiterate subjects. Brain and Cognition, 60, ­49–54. ​­ https://doi.org/10.1016/j. ­ ­ ­ bandc.2005.09.012 Sauerland, U. (2018). False speech reports in Pirahã: A comprehension experiment. In L. Amaral, M. Maia, A. Nevins & T. Roeper (Eds.), Recursion across domains. University Press. Sprouse, J., Schütze, C. T., & Almeida, D. (2013). A comparison of informal and formal acceptability judg​­ ­ ­ ments using a random sample from linguistic Inquiry 2001–2010. Lingua, 134, ­219–248. https://doi.org/ 10.1016/j.lingua.2013.07.002 ­ Whorf, B. L. (1941). The relation of habitual thought and behavior to language. In L. Spier (Ed.), Language, culture, and personality: Essays in memory of Edward Sapir. Menasha: Sapir Memorial Publication Fund. Reprinted in J. B. Carroll (Ed.). Language, thought, and reality: Selected writings of Benjamin Lee Whorf. MIT Press and John Wiley & Sons.

472

30 EXPERIMENTAL METHODS TO STUDY ­LATE-LIFE ​­ LANGUAGE LEARNING Merel Keijzer, Jelle Brouwer, Floor van den Berg and Mara van der Ploeg

30.1

Introduction and definitions

Life expectancy has greatly increased in recent years. As a result, healthy ageing investigations have been put high on the research agenda (Gu et al., 2019). This mostly led to work targeting cognitive old-age disorders, operating under the premise that Alzheimer is quickly becoming one of the primary causes of death in developed countries and incurs high healthcare costs (cf. Wong, 2020). Yet, it is too simplistic to state that ageing can be captured in a uniform cognitive description or narrative of decline. Following Christopher (2014), ageing is best integratively and holistically described as an interplay between social, cognitive and physiological factors that interact and foster change over time. What is more, ageing is a highly individual process; different life experiences modulate the ageing process. In recent years, one such life experience that has received abundant attention is language, and multilingual language use in particular (cf. van der Ploeg et al., 2020). In this chapter, we review how empirical linguistic investigations have contributed to helping the (healthy) ageing research agenda move forward. We focus particularly on work examining the effects of learning a new language in late life. We deliberately include not only cognitive outcomes but also socio-affective and linguistic outcomes. This chapter aims to provide a holistic overview of critical issues and topics that have informed the field that has come to be known as Third-age Language Learning (or TALL for short) and show the state of the art of the field as well as its methodological toolkit, to equip (linguistics) researchers wanting to know more about the topic or add to it. At the outset, however, we would like to point out that the now widely adopted term ‘TALL’ is not without problems. The third age denotes a life stage that is typically free of work obligations but still largely devoid of age-associated illnesses (Pot et al., 2019). But language learning may also be a meaningful activity for seniors with subjective or diagnosed cognitive problems or suffering from mood disorders or late-life depressive episodes (Antoniou et al., 2013; Brouwer et al., 2020). As such, we will adopt the term late-life language learning as a more inclusive label throughout this chapter.

473

DOI: 10.4324/9781003392972-34

Merel Keijzer et al.

30.2

Historical perspectives

Scientific investigations of ageing and processes associated with ageing were not very common until 30 years ago, but have since then seen an unparalleled surge. Christopher (2014) links the initial scientific disinterest to the term ‘development’ having traditionally been strongly associated with childhood and adolescence, as opposed to decline that was considered to characterize older adulthood. It follows that the earliest studies were geriatric in nature, focusing on frailty in older cohorts (Walston, 2004), which was only later augmented by gerontological work, introducing perspectives of resilience in ageing (Wilson et al., 2021). Especially prominent in the latter domain were investigations of how the interplay between the individual and their modifiable lifestyle and environmental factors could help boost cognitive reserve, the capacity to negate effects of brain pathology (cf. Song et al., 2022). Although the complex of ageing is shaped by a great variety of life experiences, speaking multiple languages has been put forward as a prevalent lifelong cognitive and social engagement that could boost cognitive reserve (cf. Bialystok, 2017; Voits et al., 2022). This claim has its origin in the constant flexibility needed to maintain several languages in one mind (cf. Van den Noort et al., 2019 for more details). Although results are mixed, review studies have shown dementia symptoms to manifest approximately four to five years later in multi- compared to monolingual seniors (cf. Perani & Abutalebi, 2015; Van den Noort et al., 2019) and point to multilingualism-inducing neuroprotective effects in late life (cf. Sala et al., 2022). Most recently, researchers have shifted their attention to the effects of learning a new language as a training intervention to promote healthy ageing, in the absence of lifelong multilingualism. A growing body of empirical work explores the cognitive and, to a lesser degree, socio-affective outcomes of such late-life language learning (see Section 30.3), based on the premise that foreign language learning engages those neural networks that tend to decline with ageing and are associated with depression (Antoniou & Wright, 2017). However, since late-life language learning is an emerging field, it is important to critically examine its foundations and the work done to date, to provide a solid basis for future work.

30.3

Critical issues and topics

It has long been recognized that, to tackle healthy ageing questions, inter- and multidisciplinary approaches are needed (Markson & Stein, 2013). The important role of language as a cognitive and social construct in the ageing process has also been recognized (see de Bot & Makoni, 2005), most recently in the form of late-life language learning. But despite these earlier explorations, most late-life language learning studies to date have examined the potential of language training to promote either neurocognitive or, more rarely, psychosocial health, although rarely in combination. To reflect the interdisciplinary nature of the field, investigations targeting several (e.g., cognitive as well as psychosocial) outcome measures are needed. To that end, studies need to clearly stipulate their primary, secondary and, where appropriate, tertiary outcomes; cognitive, psychosocial, extending to language outcomes. Similarly, the extensive applied linguistics tradition of detailing effectiveness of given teaching methods is not used to its potential in devising late-life language courses. If seniors’ language learning needs are not optimally met, the language intervention’s cognitive or socio-affective effects become less meaningful (cf. van der Ploeg et al., 2020). In short, the field of late-life language learning is not optimally rooted in earlier research traditions. Section 30.3 discusses these challenges in more detail. As a precursor to Section 30.3, it is important to note that methodological designs employed to uniquely ascribe cognitive or socio-affective effects in late life to language learning typically 474

Experimental methods to study late-life language learning

comprise a language intervention as experimental condition. This is then augmented by control groups, in which participants either actively pursue learning another new skill (e.g., playing a musical instrument) or passively convene in the absence of any new skill learning. Assigning participants to a condition is mostly done randomly (cf. Nijmeijer et al., 2021, see also Section 30.4). Most studies share an underlying hypothesis that, due to the competition that the mental representation of multiple languages incurs that has been robustly documented for lifelong multilingualism, effects following a language intervention are more substantial than those following other interventions.

30.4

Current contributions and research

Building on earlier review and synthesis studies (cf. Klimova & Pikhart, 2020; Pot et al., 2019), the number of late-life language learning is quickly expanding. The current contributions are summarized below.

30.4.1 

Neuro(cognitive), ­ ­socio-affective, ​­ and linguistic outcomes

Past work has used various methods to detail neurocognitive effects, ranging from standardized neuropsychological tests and reaction time tasks to neuroimaging indices. None, however, uniformly show a general beneficial effect of late-life language learning on cognition. Closer inspection reveals an overall lack of significant positive change in behavioural studies (Berggren et al., 2020; Fong et al., 2022; Klimova & Pikhart, 2020; Ramos et al., 2017; Valis et al., 2019; Ware et al., 2017). This would suggest that language training does not impact neurocognition in the short term. However, these results do not rule out the baseline level of cognitive functioning being maintained or preserved as a meaningful outcome. Indeed, in a small-scale study, Bubbico et al. (2019) found stable global cognitive performance over time in a language intervention group compared to decreases in cognitive performance and functional connectivity in the control groups. Strikingly, changes in functional connectivity in language control and executive control networks correlated positively with changes in the cognitive Mini Mental State Examination (MMSE) scores in the language intervention only, suggesting that the language training affected functional connectivity. The study emphasizes the importance of augmenting behavioural with neuroimaging data. Other work identified cognitive improvement over time but did not find any differences between the language training and the control group(s) (Kliesch et al., 2022; Tigka et al., 2019; Wong et al., 2019a), or even report a more substantial improvement in the control group (Meltzer et al., 2021). In other words, these studies found a positive effect of language learning, but no evidence that language training is unique in establishing cognitive change over other cognitively stimulating activities, such as gaming. That is not to say that no unique cognitive performance improvement has been attested for language learning at all; Long et al. (2020) and Pfenninger and Polz (2018) found the greatest improvement in their language group; two other studies documented most improvement in the language training group in comparison to control groups although all groups’ cognitive performance increased over time (Bak et al., 2016; Wong et al., 2019a).

30.4.2 ­Well-being ​­ outcomes The effects of language learning on subjective well-being have been largely ignored until now, even though language learning is an inherently social activity (Brouwer et al., 2020). Recent 475

Merel Keijzer et al.

qualitative work has found gains in (linguistic) self-confidence and subjective well-being due to language learning (Klimova et al., 2020; Pfenninger & Polz, 2018; Valis et al., 2019). This is balanced out by Ware et al. (2017), who found no improvement in subjective levels of loneliness or social isolation after four months of training, ascribing this to relatively high levels of well-being at baseline. Importantly, the bidirectional relationship between language abilities and cognition has been pointed out (Pot et al., 2019). A separate line of work investigates which individual (cognitive and/or socio-affective) characteristics contribute to better learning outcomes in older adults. What has been found is that language learning gains are associated with better working memory at baseline (Blumenfeld et al., 2017; Kliesch et al., 2017; Mackey & Sachs, 2012), (neurological) indices of semantic, episodic, and associative memory function (Fong et al., 2022; Marcotte & Ansaldo, 2014; Nilsson et al., 2021), a greater typological distance between L1 and target language (Blumenfeld et al., 2017), and higher L1 fluency levels (Kliesch et al., 2017). These studies were only conducted with ‘healthy’ older adults; it is unknown if these factors also predict language learning outcomes in other older adult populations, for whom language control and cognition may be more severely affected. In sum, existing research does not uniformly show beneficial effects of late-life learning. However, findings are difficult to generalize, since outcome measures and experimental designs barely overlap between studies, and some studies do point to enhancements in different domains. In the next subsections, we elaborate on two aspects of late-life language learning studies that could shed light on the circumstances under which potential benefits of language training may emerge: language intervention designs and sample characteristics.

30.4.3

Language teaching methods

The language courses that form the basis of earlier investigations differ considerably: while some design their own course, others rely (partly) on existing language learning tools, such as Rosetta Stone (Wong et al., 2019b, 2019a) or DuoLingo (Kliesch et al., 2022; Kliesch & Pfenninger, 2021; Meltzer et al., 2021). Studies also differ in the instruction they offer, ranging from full classroom instruction to online methods or hybrid forms of computer-assisted language learning, with group size also varying across studies. Few studies make explicit the type of language instruction offered, specifically whether implicit versus explicit teaching methods are involved (Cox, 2017) or what their focus is (e.g., lexical acquisition, or more structural language development). Most consideration has been given to the optimal course duration and intensity, resulting in different course guidelines: from at least 5 hours a week (Bak et al., 2016) to a total of six months at the least (Antoniou et al., 2013) as reference points to maintain or increase cognitive performance. Although the language to be taught often seems to be a pragmatic issue (cf. Pot et al., 2019), past studies have considered whether the language to be learned was entirely new to seniors or constituted relearning of an earlier mastered language (cf. Antoniou & Wright, 2017 for more details; Fong et al., 2022; Marcotte & Ansaldo, 2014; Nilsson et al., 2021). Although inconclusive, it may be posited that more cognitively demanding language courses may incur the greatest effects (i.e., teaching new, typologically unrelated languages). Most studies investigating late-life language learning so far have studied cognitively healthy older adults. However, late-life language learning may also have clinical implications in addition to offering a meaningful pastime activity. As posited in Antoniou et al. (2013), language training may be especially beneficial for populations at-risk of cognitive impairment or dementia or for 476

Experimental methods to study late-life language learning

those suffering from age-associated mood disorders. These include older adults with subjective cognitive decline (Nijmeijer et al., 2021) and Mild Cognitive Impairment (Tigka et al., 2019; Wong et al., 2019b, Van den Berg et al., in preparation). This is underscored by studies that have shown language training to be most (cognitively) effective for those older adults with lower baseline working memory (Kliesch et al., 2022). Age-associated mood disorders are typically taken to comprise older adults with (a history of) depression (Brouwer et al., 2020). While treatments are generally successful in reducing depressive symptoms, the often co-occurring cognitive impairment typically persists (Bhalla et al., 2006). Language learning interventions have great potential here, as they have been hypothesized to benefit both cognition and subjective well-being. Ideally, the language course should be adapted to the sample under investigation, or even be adaptive at the individual level. In sum, the field of late-life language learning is still in its infancy, and current research on the cognitive and psychosocial effects of third-age language learning produces mixed findings. Given the great variation in language intervention designs, optimal parameters for cognitive and/or psychosocial benefits remain unknown, but the question is whether this will ever be the case. Indeed, the effectiveness of the language training will likely depend on the older adult population under investigation, pertaining to the individual even given the heterogeneity of the older adult age group and the non-uniformity of ageing as a process in general. Although recommendations have been proposed, the next sections will describe in more detail possible ways in which past and future work has and could incorporate these issues in their designs, building on and celebrating the multiand interdisciplinary nature of the field.

30.5

Main research methods

It follows from the discussion above that there is no single way to tackle late-life language learning investigations. The topic is best approached from multiple, complementary perspectives, both in terms of study design and outcome measures, adaptive to the study goal. This section starts by describing commonly used study designs and continues by discussing measures used to tap late-life language learning outcomes (more specifically cognition, well-being, and language performance).

30.5.1

Study design

Longitudinal designs are most commonly used in research aiming to uncover cognitive or psychosocial effects of late-life language learning (Bak et al., 2016; Kliesch et al., 2022; Wong et al., 2019a), where pre-test and post-test data are compared to investigate potential interventioninduced changes. Follow-up data collected several months after the course can show if changes are retained (Antoniou et al., 2013). However, longitudinal studies require significant time investments (10 hours of testing per participant is not uncommon) and may lead to high-dropout rates. Longitudinal studies often include additional cognitively stimulating or passive control interventions to assess the unique effects of language learning as an old-age intervention. Typically, participants are randomly assigned to one such condition. These randomized controlled trials (RCTs) have a long-standing tradition in medical sciences including gerontology (e.g., Ball et al., 2002) but are recent additions to the language sciences. Previously, control interventions have included music, art (Nijmeijer et al., 2021), lecture series (Brouwer et al., 2020), or games (Wong et al., 2019a). The random group-allocation of RCTs is an advantage: because people cannot select their own intervention, effects can more easily be ascribed to a given intervention rather than motivation (but see Section 30.4). Disadvantages include subject attrition and fewer inclusions, 477

Merel Keijzer et al.

as all participants have to meet inclusion criteria that pertain to all conditions. Furthermore, in real-life language interventions, people only sign up for courses they will enjoy. This is why some researchers allow participants to self-select interventions (Belleville et al., 2019). Cross-sectional designs are found in addition to longitudinal ones. With the trade-off that development over time cannot be tracked, cross-sectional designs can more easily be implemented, making them especially suitable for research questions pertaining to, for instance, participants’ experiences of a course (Klimova et al., 2021; Matsumoto, 2019) or classroom observations (van der Ploeg et al., 2022).

30.5.2

Measures to assess late-life language learning outcomes

Past work has employed various measures to assess the effectiveness of language courses in older adulthood (see Section 30.4). For brevity’s sake, we discuss only a few specific measures and focus on general task types.

30.5.2.1

Cognitive measures

Most studies investigating late-life language learning investigate changes in cognitive functioning, reflective of Antoniou et al.’s (2013) initial hypothesis that language learning might especially benefit older adults because of its potential to activate brain regions that decline in ageing. This can be done using neuropsychological (and neuroimaging) methods but also self-reported measures of cognitive health. Neuropsychological data can objectively detail cognitive changes following a language course and allow researchers to investigate specific executive functioning changes (e.g., working memory). Prior to interventions, researchers often use geriatric measures such as the Montreal Cognitive Assessment test (MoCA) to (pre)screen participants for cognitive decline (e.g., Kliesch et al., 2022), but the same tools have also been used as outcome measures by other studies (e.g., Klimova et al., 2020; Ware et al., 2017). Although commonly used, it is important to keep in mind that many neuropsychological tasks were originally designed to differentiate pathological versus normal cognitive development as a function of ageing, and may not always be meaningful when the aim is to measure cognitive change as a function of a given intervention. Conversely, behavioural tasks such as the colour-shape switching task, are best employed to measure specific executive function (fluctuations), without linking these results to brain health. Neuroimaging techniques are not (yet) commonly used in studies targeting late-life language learning but may present a complementary perspective; for instance, structural MRI can shed light on hippocampal volume as a predictor of vocabulary acquisition (Nilsson et al., 2021), while resting state fMRI shows that even a short language course may reshape brain connections (Bubbico et al., 2019). To date, no resting state EEG research has been done in older adult language learners, but preliminary evidence from younger adults suggests the method is suitable for measuring default mode network complexity, which is correlated with cognitive decline (Fomina et al., 2015). Likewise, resting state pupillometry data could show changes in working memory capacity (Aminihajibashi et al., 2019). Finally, measuring self-perceived changes in cognitive performance over time may holistically add to more objective (or indeed neuroimaging) cognitive measures. (Online) questionnaires are mostly used for this purpose and have been shown to discriminate between cognitively healthy and impaired participants with good specificity and sensitivity (Rami et al., 2014). 478

Experimental methods to study late-life language learning

30.5.2.2

Psychosocial measures

A smaller number of studies have also incorporated measures of psychological and social wellbeing. Indeed, language learning may lead to more social contacts and boost self-confidence (Pfenninger & Polz, 2018), in turn perhaps indirectly influencing cognitive health. The language learning process itself may also be seen by older adults themselves as a pleasant leisure activity ­ (Matsumoto, 2019). Since well-being is a broad construct, studies have generally indexed participants’ experiences using tailored questionnaires (e.g., Klimova et al., 2021; Pfenninger & Polz, 2018) and interviews (Pfenninger & Polz, 2018; Ware et al., 2017). Questionnaires can quantify how much satisfaction, happiness, or stress a language course incurs and can be easily administered. Interviews and subsequent transcriptions require more time and resources, but they do provide rich insights into participants’ perceived well-being levels and can empower participants, in that the research is done ‘with’ rather than ‘about’ them. Many tools to tap well-being have their origin in psychology, geriatrics, and gerontology research traditions. These include (short) scales to measure the strength of depressive symptoms, the degree of rumination (associated with depression), positive and negative emotions and affect, among others (see the OSF page for a more comprehensive list).

30.5.2.3

Language proficiency measures

While language interventions are often employed to improve cognitive functioning or enhance well-being, language proficiency outcomes themselves can also be investigated, to assess the effectiveness of given late-life language teaching methods (Van der Ploeg et al., forthcoming). Several tools from EFL research have been implemented, targeting different domains such as speaking and listening proficiency, vocabulary size, and verbal fluency. It is recommended that at least one objective language measure is included in any test battery to relate cognitive and/or psychosocial health outcomes to language learning gains, to gain a more holistic picture.

30.5.3

Final research methods considerations

A final relevant outcome measure is participants’ experiences, needs, and motivation in relation to a language course (see also Section 30.4). This is relevant to study because having a course that participants enjoy will likely lead to higher intervention adherence. This construct can be studied through questionnaires, interviews, and classroom observations (van der Ploeg et al., 2022). Although holistic designs are preferred, informed choices as to which measures to include must be clearly stipulated in any reporting paper, as a test battery that is too long may well result in fatigue effects in older adult participants.

30.6

Recommendations for practice

As the most important piece of advice, researchers interested in studying late-life language learning need to be mindful of the population they are working with. This last section puts forward practical recommendations for researchers.

30.6.1

Practical recommendations for the late-life language classroom

Despite the increased popularity of late-life language learning, insights from applied linguistics are also sparsely used; insights into teaching method effectiveness, participant motivation, anxiety, 479

Merel Keijzer et al.

and individual differences to impact the language learning process obtained in the context of younger learners have not been extended to older populations but are needed to maximize enjoyment in late-life language learning, and have the potential to uncover cognitive and psychosocial benefits. Researchers have long agreed that motivation is vital in language learning. Schiller and Dorner (2021) found that attitude towards learning and goal specificity contributed most to seniors’ motivation. In an online late-life language course, Ushida (2005) found that all of Gardner’s Attitude and Motivation Test Battery dimensions were relatively stable throughout the course, except for anxiety, which was relatively high at the beginning of the course but decreased proportionally. This can be easily explained, as seniors need to overcome anxiety and technological problems related to online language learning at the onset of their learning process. It has been shown, however, that with minor adaptations online courses are suitable for older adults. Adaptations may include extra lessons to explain the technology and using low threshold video-conferencing tools (e.g., Google Meet). To increase motivation and facilitate learning, Bosisio (2019) suggests teaching materials that re-activate and expand previous language knowledge rather than focus on new language learning. With regards to teaching techniques more specifically, research suggests that feedback should be mainly implicit, allowing seniors to benefit from naturalistic, targeted input (Lenet et al., 2011). Additionally, the use of formal assessment (e.g., graded assignments and tests) is discouraged in the late-life language classroom (Grotek & Ślęzak-Świat, 2017; Klimczak-Pawlak & Kossakowska­Pisarek, 2018). Finally, seniors tend to ask many questions in the classroom. The largest proportion of these questions can be classified as ‘wonderment questions’ (i.e., questions that are of interest to the learner but are not needed for the progressivity of the task) (van der Ploeg et al., 2022). This is an important finding, as it points to the fact that a different teaching style is required for the older adult classroom: a style that focuses mostly on social interaction and allows room for curiosity-driven learning, where the older adults themselves can take agency over their own learning process. In support of this suggestion, Duay and Bryan (2008) call late-life language learning a ‘social endeavour ’ and suggest that the focus of a language course should, in fact, be on interaction, as it provides older adult learners with exercises and tools to engage with others in a globalized world. Similarly, a review by Kacetl and Klímová (2021) adds communicative competence as a goal for late-life learning, mostly as part of a student-centred approach incorporating familiar topics/real-life experiences, relevant content, and listening comprehension. In sum, being mindful of late-life language teaching methods serves to optimize chances of uncovering a cognitive or psychosocial effect to ensue from late-life language learning, and can greatly advance the field.

30.6.2

Practical recommendations for methods to assess ­late-life ​­ language learning effects

Certain methods are difficult to implement in older populations, most notably neuroimaging methods. For instance, applying an EEG cap can be uncomfortable due to fragile skin and the conductive gel may leak more easily in bald participants, causing bridged electrodes (Luck, 2014). Similarly, participants wearing glasses may be problematic in eye-tracking designs. Furthermore, the calibration procedure is problematic in those with reduced visual acuity (e.g., unregulated glaucoma, macular degeneration), and obscured pupils (e.g., ptosis or cataracts) can cause suboptimal data. Lastly, MRI and fMRI data collection is impossible in people with pacemakers. 480

Experimental methods to study late-life language learning

From experience, we recommend keeping testing sessions to 1.5 hours or less to avoid fatigue effects, also in participants who are otherwise cognitively healthy. Fatigue effects can also be minimized by asking participants to complete questionnaires at home. Errors or missing information can then be corrected during the session. Finally, it is wise to allocate extra time for social interaction during testing sessions, not only language learning but research participation is a social endeavour for older adults.

30.7

Future directions

The aim of this chapter has been to provide a holistic overview of critical issues and topics that have informed the relatively new field of late-life language learning. It has showcased the relevance of experimental linguistic investigations in healthy ageing research. The field appears fragmented and unidimensional, yielding mixed results in the process. We have underscored the premise that ageing encompasses more than merely mental decline to be halted. Implicitly, a predominant focus on cognitive improvement or maintenance resulting from late-life language learning presents a discourse of decline. Future work should be mindful of holistic ageing, embracing the inherent interdisciplinarity of the field but also highlighting the positive psychology of ageing and the fact that late life can still be a period of growth and development (cf. Ramscar, 2022). Future work adopting a holistic approach that augments cognitive with psychosocial as well as language outcomes can add to a positive discourse of ageing. Another issue is that the studies to date have a strong clinical angle, reflected also in designs such as RCTs. Despite their merit (see Section 30.5), RCTs are not necessarily ecologically valid. Extending the current line of research looking at teaching method efficacy, future work would also do well to examine the role of agency and motivation more closely, in any effects to ensue from late-life language as opposed to other interventions.

Acknowledgements This chapter is based on research insights obtained by means of NWO Vidi innovation scheme funding 016.Vidi.185.190 awarded to M. Keijzer, which is hereby gratefully acknowledged. We would also like to thank Saskia Nijmeijer and Marie-José van Tol, whose research project ‘FlexLang’ generated many insights that were included in this chapter.

Further reading Antoniou, M., & Wright, S. M. (2017). Uncovering the mechanisms responsible for why language learning may ­ ­ ­ promote healthy cognitive aging. Frontiers in Psychology, 8, 2217. https://doi.org/10.3389/fpsyg.2017.02217 Kliesch, M., Pfenninger, S. E., Wieling, M., Stark, E., & Meyer, M. (2022). Cognitive benefits of learning additional languages in old adulthood? Insights from an intensive longitudinal intervention study. Applied ­ ­653–676. ​­ ­ ­ ­ ­ Linguistics, 43(4), https://doi.org/10.1093/applin/amab077 Pot, A., Porkert, J., & Keijzer, M. (2019). The bidirectional in bilingual: Cognitive, social and linguistic effects ­ 98. https://doi.org/10.3390/bs9090098 ­ ­ ­ of and on third-age language learning. Behavioral Sciences, 9(9), The authors have created an Open Science Framework page containing additional materials, including an overview of often-used tasks and an overview of existing studies; this page can be accessed at OSF.io/dxpqc/

Related topics Assessing adult linguistic competence; experimental methods to study disorders of language production in adults; experimental methods to study language learners

481

Merel Keijzer et al.

References Aminihajibashi, S., Hagen, T., Foldal, M. D., Laeng, B., & Espeseth, T. (2019). Individual differences in resting-state pupil size: Evidence for association between working memory capacity and pupil size variability. International Journal of Psychophysiology, 140, 1–7. ­ ​­ https://doi.org/10.1016/j.ijpsycho.2019.03.007 ­ ­ ­ Antoniou, M., Gunasekera, G. M., & Wong, P. C. M. (2013). Foreign language training as cognitive therapy for age-related cognitive decline: A hypothesis for future research. Neuroscience & Biobehavioral Reviews, 37(10), ­ ­2689–2698. ​­ https://doi.org/10.1016/j.neubiorev.2013.09.004 ­ ­ ­ Antoniou, M., & Wright, S. M. (2017). Uncovering the mechanisms responsible for why language learning may promote healthy cognitive aging. Frontiers in Psychology, 8, 2217. https://doi.org/10.3389/ ­ ­ ­ fpsyg.2017.02217 Bak, T. H., Long, M. R., Vega-Mendoza, M., & Sorace, A. (2016). Novelty, challenge, and practice: The ­ e0153485. https://doi. ­ impact of intensive language learning on attentional functions. PLOS ONE, 11(4), ­ ­ org/10.1371/journal.pone.0153485 Ball, K., Berch, D. B., Helmers, K. F., Jobe, J. B., Leveck, M. D., Marsiske, M., Morris, J. N., Rebok, G. W., Smith, D. M., Tennstedt, S. L., Unverzagt, F. W., Willis, S. L., & ACTIVE Study Group. (2002). Effects of cognitive training interventions with older adults: A randomized controlled trial. JAMA, 288(18), ­ 2271. ­ ­ ­ https://doi.org/10.1001/jama.288.18.2271 Belleville, S., Moussard, A., Ansaldo, A. I., Belchior, P., Bherer, L., Bier, N., Bohbot, V. D., Bruneau, M. A., Cuddy, L. L., & Gilbert, B. (2019). Rationale and protocol of the ENGAGE study: A double-blind randomized controlled preference trial using a comprehensive cohort design to measure the effect of a ­ 1–18. ­ ​­ cognitive and leisure-based intervention in older adults with a memory complaint. Trials, 20(1), https://doi.org/10.1186/s13063-019-3250-6 ­ ­ ­­ ­​­­ ­​­­ ​­ Berggren, R., Nilsson, J., Brehmer, Y., Schmiedek, F., & Lövdén, M. (2020). Foreign language learning in older age does not improve memory or intelligence: Evidence from a randomized controlled study. Psychology and Aging, 35(2), ­ ­212–219. ​­ https://doi.org/10.1037/pag0000439 ­ ­ ­ Bhalla, R. K., Butters, M. A., Mulsant, B. H., Begley, A. E., Zmuda, M. D., Schoderbek, B., Pollock, B. G., Reynolds, C. F., & Becker, J. T. (2006). Persistence of neuropsychologic deficits in the remitted state of ­late-life ​­ depression. The American Journal of Geriatric Psychiatry, 14(5), ­ ­419–427. ​­ https://doi.org/ ­ ­ ­ 10.1097/01.JGP.0000203130.45421.69 Bialystok, E. (2017). The bilingual adaptation: How minds accommodate experience. Psychological Bulletin, 143(3), ­ 233–262. ­ ​­ https://doi.org/10.1037/bul0000099 ­ ­ ­ Blumenfeld, H. K., Quinzon, S. J. R., Alsol, C., & Riera, S. A. (2017). Predictors of successful learning in ​­ https://doi. ­ multilingual older adults acquiring a majority language. Frontiers in Communication, 2, ­1–19. org/10.3389/fcomm.2017.00023 ­ ­ Bosisio, N. (2019). Language learning in the third age. Geopolitical, Social Security and Freedom Journal, ­ ­21–36. ​­ ­ ­ ­­ ­​­­ ​­ 2(1), https://doi.org/10.2478/gssfj-2019-0003 Brouwer, J., van den Berg, F., Knooihuizen, R., Loerts, H., & Keijzer, M. (2020). Exploring language learning as a potential tool against cognitive impairment in late-life depression: Two meta-analyses and sugges­ 132. https://doi.org/10.3390/bs10090132 ­ ­ ­ tions for future research. Behavioral Sciences, 10(9), Bubbico, G., Chiacchiaretta, P., Parenti, M., di Marco, M., Panara, V., Sepede, G., Ferretti, A., & Perrucci, M. G. (2019). Effects of second language learning on the plastic aging brain: Functional connectivity, cognitive decline, and reorganization. Frontiers in Neuroscience, 13, 423. https://doi.org/10.3389/fnins.2019.00423 ­ ­ ­ Christopher, G. (2014). The psychology of ageing: From mind to society. Palgrave Macmillan. Cox, J. G. (2017). Explicit instruction, bilingualism, and the older adult learner. Studies in Second Language Acquisition, 39(1), ­ ­29–58. ​­ https://doi.org/10.1017/S0272263115000364 ­ ­ ­ de Bot, K., & Makoni, S. (2005). Language and aging in multilingual contexts. Multilingual Matters. https:// doi.org/10.21832/9781853598425 ­ ­ Duay, D. L., & Bryan, V. C. (2008). Learning in later life: What seniors want in a learning experience. Educational Gerontology, 34(12), ­ ­1070–1086. ​­ https://doi.org/10.1080/03601270802290177 ­ ­ ­ Fomina, T., Hohmann, M., Schölkopf, B., & Grosse-Wentrup, M. (2015). Identification of the default mode network with electroencephalography. 2015 37th annual international conference of the IEEE engineering in medicine and biology society (EMBC), ­7566–7569. ​­ https://doi.org/10.1109/EMBC.2015.7320143 ­ ­ ­ Fong, M. C.-M., Ma, M. K.-H., Chui, J. Y. T., Law, T. S. T., Hui, N.-Y., Au, A., & Wang, W. S. (2022). Foreign language learning in older adults: Anatomical and cognitive markers of vocabulary learning success. Frontiers in Human Neuroscience, 16, 787413. https://doi.org/10.3389/fnhum.2022.787413 ­ ­ ­

482

Experimental methods to study late-life language learning Grotek, M., & Ślęzak-Świat, A. (2017). Balance and coordination vs reading comprehension in l2 in late ­­  ­91–107). ​­ adulthood. In D. Gabryś-Barker (Ed.), Third age learners of foreign languages (pp. Multilingual ­ ­ ­­ ​­ Matters. https://doi.org/10.21832/9781783099412-008 ​­ Bai, J.-B., ​­ Chen, X.-L., ​­ Wu, W.-W., ​­ ​­   & Tan, X.-D. ​­ (2019). ­ Gu, Y.-H., Liu, X.-X., Healthy aging: A bibliometric ​­ ­ ­ ­ analysis of the literature. Experimental Gerontology, 116, ­93–105. https://doi.org/10.1016/j.exger.2018. 11.014 Kacetl, J., & Klímová, B. (2021). Third-age learners and approaches to language teaching. Education Sciences, 11(7), ­ Art. 7. https://doi.org/10.3390/educsci11070310 ­ ­ ­ Kliesch, M., Giroud, N., Pfenninger, S. E., & Meyer, M. (2017). Research on second language acquisition in old adulthood: What we have and what we need. In D. Gabryś-Barker (Ed.), Third age learners of foreign languages (pp. ­­  ­48–75). ​­ Multilingual Matters. https://doi.org/10.21832/9781783099412-006 ­ ­ ­­ ​­ Kliesch, M., & Pfenninger, S. E. (2021). Cognitive and socioaffective predictors of L2 microdevelopment in late adulthood: A longitudinal intervention study. The Modern Language Journal, 105(1), ­ ­237–266. ​­ https://doi.org/10.1111/modl.12696 ­ ­ ­ Kliesch, M., Pfenninger, S. E., Wieling, M., Stark, E., & Meyer, M. (2022). Cognitive benefits of learning additional languages in old adulthood? Insights from an intensive longitudinal intervention study. Applied Linguistics, 43(4), ­ ­653–676. ​­ https://doi.org/10.1093/applin/amab077 ­ ­ ­ ­ Klimczak-Pawlak, A., & Kossakowska-Pisarek, S. (2018). Language learning over 50 at the open university in Poland: An exploratory study of needs and emotions. Educational Gerontology, 44(4), ­ 255–264. ­ ​­ https://­ doi.org/10.1080/03601277.2018.1454389 ­ ­ Klimova, B., & Pikhart, M. (2020). Current research on the impact of foreign language learning among healthy seniors on their cognitive functions from a positive psychology perspective—A systematic review. Frontiers in Psychology, 11, 765. https://doi.org/10.3389/fpsyg.2020.00765 ­ ­ ­ Klimova, B., Pikhart, M., Cierniak-Emerych, A., Dziuba, S., & Firlej, K. (2021). A comparative psycholinguistic study on the subjective feelings of well-being outcomes of foreign language learning in older adults from the Czech Republic and Poland. Frontiers in Psychology, 12, 606083. https://doi.org/ ­ 10.3389/fpsyg.2021.606083 Klimova, B., Slaninova, G., Prazak, P., Kacetl, J., & Valis, M. (2020). Enhancing cognitive performance of ​­ ​­ mixed-methods ­ ​­ healthy Czech seniors through ­non-native language ­learning—A pilot study. Brain Sci­ 573. https://doi.org/10.3390/brainsci10090573 ­ ­ ­ ences, 10(9), Lenet, A. E., Sanz, C., Lado, B., Howard, J. H., & Howard, D. V. (2011). Aging, Pedagogical Conditions, and Differential Success in SLA: In C. Sanz & R. P. Leow (Eds.), Implicit and explicit language learning (pp. ­­  ­73–84). ​­ Georgetown University Press; JSTOR. http://www.jstor.org/stable/j.ctt2tt7k0.11 ­ ­ ­ Long, M. R., Vega-Mendoza, M., Rohde, H., Sorace, A., & Bak, T. H. (2020). Understudied factors contributing to variability in cognitive performance related to language learning. Bilingualism: Language and Cognition, 23(4), https://doi.org/10.1017/S1366728919000749 ­ ­801–811. ​­ ­ ­ ­ Luck, S. J. (2014). An introduction to the event-related potential technique (2nd ed.). The MIT Press. Mackey, A., & Sachs, R. (2012). Older learners in SLA research: A first look at working memory, feedback, ­ ­ ​­ ­ ­ ­ ­ ​­ and l2 development. Language Learning, 62(3), 704–740. https://doi.org/10.1111/j.1467-9922.2011. 00649.x Marcotte, K., & Ansaldo, A. I. (2014). Age-related behavioural and neurofunctional patterns of second language word learning: Different ways of being successful. Brain and Language, 135, ­9–19. ​­ https://doi. ­ ­ ­ org/10.1016/j.bandl.2014.04.004 Markson, E., & Stein, P. (2013). Getting unstuck: Interdisciplinarity and aging. Sociological Forum, 28(4), ­ ­873–880. ​­ https://doi.org/10.1111/socf.12061 ­ ­ ­ Matsumoto, D. (2019). Exploring third-age foreign language learning from the well-being perspective: Work in progress. Studies in Self-Access Learning Journal, 10, ­111–116. ​­ Meltzer, J. A., Kates Rose, M., Le, A. Y., Spencer, K. A., Goldstein, L., Gubanova, A., Lai, A. C., Yossofzai, M., Armstrong, S. E. M., & Bialystok, E. (2021). Improvement in executive function for older adults through smartphone apps: A randomized clinical trial comparing language learning and brain training. Aging, Neuropsychology, and Cognition, 30(2), 150–171. https://doi.org/10.1080/13825585.2021. ­ ­ ­ 1991262 Nijmeijer, S. E., van Tol, M.-J., Aleman, A., & Keijzer, M. (2021). Foreign language learning as cognitive training to prevent old age disorders? Protocol of a randomized controlled trial of language training vs. musical training and social interaction in elderly with subjective cognitive decline. Frontiers in Aging ­ ­ ­ Neuroscience, 13, 550180. https://doi.org/10.3389/fnagi.2021.550180

483

Merel Keijzer et al. Nilsson, J., Berggren, R., Garzón, B., Lebedev, A. V., & Lövdén, M. (2021). Second language learning in older adults: Effects on brain structure and predictors of learning success. Frontiers in Aging Neuroscience, 13, 666851. https://doi.org/10.3389/fnagi.2021.666851 ­ ­ ­ Perani, D., & Abutalebi, J. (2015). Bilingualism, dementia, cognitive and neural reserve. Current Opinion in ­ 618–625. ­ ​­ ­ ­ ­ Neurology, 28(6), https://doi.org/10.1097/WCO.0000000000000267 Pfenninger, S. E., & Polz, S. (2018). Foreign language learning in the third age: A pilot feasibility study on cognitive, socio-affective and linguistic drivers and benefits in relation to previous bilingualism of the learner. Journal of the European Second Language Association, 2(1), ­ 1. https://doi.org/10.22599/jesla.36 ­ ­ ­ Pot, A., Porkert, J., & Keijzer, M. (2019). The bidirectional in bilingual: Cognitive, social and linguistic effects of and on third-age language learning. Behavioral Sciences, 9(9), ­ 98. https://doi.org/10.3390/bs9090098 ­ ­ ­ Rami, L., Mollica, M. A., García-Sanchez, C., Saldaña, J., Sanchez, B., Sala, I., Valls-Pedret, C., Castellví, M., Olives, J., & Molinuevo, J. L. (2014). The subjective cognitive decline questionnaire (SCD-Q): A validation study. Journal of Alzheimer’s Disease, 41(2), https://doi.org/10.3233/JAD-132027 ­ ­453–466. ​­ ­ ­ ­­ ​­ Ramos, S., Fernández García, Y., Antón, E., Casaponsa, A., & Duñabeitia, J. A. (2017). Does learning a ​­ https://doi.org/ ­ ­ language in the elderly enhance switching ability? Journal of Neurolinguistics, 43, ­39–48. 10.1016/j.jneuroling.2016.09.001 ­ Ramscar, M. (2022). Psycholinguistics and aging. In J. Fernández-Domínguez. & J. & M. Aronoff (Eds.), Oxford research encyclopedia of linguistics. Oxford University Press. https://doi.org/10.1093/acrefore/ 9780199384655.013.374 Sala, A., Malpetti, M., Farsad, M., Lubian, F., Magnani, G., Frasca Polara, G., Epiney, J., Abutalebi, J., Assal, F., Garibotto, V., & Perani, D. (2022). Lifelong bilingualism and mechanisms of neuroprotection in Alzheimer dementia. Human Brain Mapping, 43(2), ­ ­581–592. ​­ https://doi.org/10.1002/hbm.25605 ­ ­ ­ Schiller, E., & Dorner, H. (2021). A multi-perspective analysis of adult learner differences in foreign language learning: Motivation, autonomous learning and self-regulation. Konin Language Studies 8, 295–318. https://doi.org/10.30438/ksj.2020.8.3.4 ­ ­ ­ Song, S., Stern, Y., & Gu, Y. (2022). Modifiable lifestyle factors and cognitive reserve: A systematic review of current evidence. Ageing Research Reviews, 74, 101551. https://doi.org/10.1016/j.arr.2021.101551 ­ ­ ­ Tigka, E., Dimitrios Kazis, Tsolaki, M., Bamidis, P., Papadimitriou, M., & Kassapi, E. (2019). FL learning could contribute to the enhancement of cognitive functions in MCI older adults. Intercultural Translation Semiotic, 8(2), 1–24. Ushida, E. (2005). The role of students’ attitudes and motivation in second language learning in online language courses. CALICO Journal, 23(1), ­ 49–78. ­ ​­ Valis, M., Slaninova, G., Prazak, P., Poulova, P., Kacetl, J., & Klimova, B. (2019). Impact of learning a foreign language on the enhancement of cognitive functions among healthy older population. Journal of Psycholinguistic Research, 48(6), ­ ­1311–1318. ​­ https://doi.org/10.1007/s10936-019-09659-6 ­ ­ ­­ ­​­­ ­​­­ ​­ Van den Noort, M., Vermeire, K., Bosch, P., Staudte, H., Krajenbrink, T., Jaswetz, L., Struys, E., Yeo, S., Barisch, P., Perriard, B., Lee, S.-H., & Lim, S. (2019). A systematic review on the possible relationship between bilingualism, cognitive decline, and the onset of dementia. Behavioral Sciences, 9(7), 81. https:// doi.org/10.3390/bs9070081 ­ ­ van der Ploeg, M., Keijzer, M., & Lowie, W. (2020). Methodological concerns and their solutions in third-age language learning studies. Dutch Journal of Applied Linguistics, 9(1–2), ­­ ​­ 97–108. ­ ​­ https://doi.org/10.1075/ ­ ­ ­ dujal.19036.van van der Ploeg, M., Willemsen, A., Richter, L., Keijzer, M., & Koole, T. (2022). Requests for assistance in the ­third-age ​­ language classroom. Classroom Discourse, 13(4), ­ ­1–21. ​­ https://doi.org/10.1080/19463014. ­ ­ ­ 2021.2013910 Voits, T., DeLuca, V., & Abutalebi, J. (2022). The nuance of bilingualism as a reserve contributor: Conveying research to the broader neuroscience community. Frontiers in Psychology, 13, 909266. https://doi. org/10.3389/fpsyg.2022.909266 ­ ­ Walston, J. (2004). Frailty—The search for underlying causes. Science of Aging Knowledge Environment, 2004(4). ­ https://doi.org/10.1126/sageke.2004.4.pe4 ­ ­ ­ Ware, C., Damnee, S., Djabelkhir, L., Cristancho, V., Wu, Y.-H., Benovici, J., Pino, M., & Rigaud, A.-S. (2017). Maintaining cognitive functioning in healthy seniors with a technology-based foreign language program: A pilot feasibility study. Frontiers in Aging Neuroscience, 9. https://doi.org/10.3389/fnagi.2017.00042 ­ ­ ­ Wilson, C. A., Walker, D., & Saklofske, D. H. (2021). Developing a model of resilience in older adulthood: A qualitative ­meta-synthesis. ​­ Ageing and Society, 41(8), ­ ­1920–1942. ​­ https://doi.org/10.1017/ ­ ­ ­ S0144686X20000112

484

Experimental methods to study late-life language learning Wong, P., Ou, J., Pang, C. W. Y., Zhang, L., Tse, C. S., Lam, L. C. W., & Antoniou, M. (2019a). Language training leads to global cognitive improvement in older adults: A preliminary study. Journal of Speech, Language, and Hearing Research, 62(7), ­ ­2411–2424. ​­ https://doi.org/10.1044/2019_JSLHR-L-18-0321 ­ ­ ­­ ­​­­ ­​­­ ​­ Wong, P., Ou, J., Pang, C. W., Zhang, L., Tse, C. S., Lam, L. C., & Antoniou, M. (2019b). Foreign language learning as potential treatment for mild cognitive impairment. Hong Kong Medical Journal, 25(Suppl 7), S41–43. Wong, W. (2020). Economic burden of Alzheimer disease and managed care considerations. The American Journal of Managed Care, 26(Suppl. ­ 8), ­S177–S183. ​­ https://doi.org/10.37765/ajmc.2020.88482 ­ ­ ­

485

INDEX

Note: Bold page numbers refer to tables; Italic page numbers refer to figures and page numbers followed by “n” denote endnotes. Aachen Aphasia Test (AAT) 408, 415 Aaronson, D. 13 abstract algorithm 442 academic achievement 390 acceptability judgements tasks (AJTs) 58–59, 80, 81, 259, 331, 332, 379, 423–425, 427, 433, 464 accuracy scores 217, 220, 223, 225–226, 412 acoustic analyses 21, 27–30 acoustic phonetics 8, 214, 376, 379 acquired brain disorders 408–409 action-based version 448 action compatibility effect 176 action language 173–180 active-filler strategy 57 Adamou, E. 462–465 ad hoc Likert-style scales 151 ad hominem arguments 146, 147, 149 aerodynamics 28–29 affixation vs. compounding 44 affix stripping 42, 45 age-associated mood disorders 477 age matching 393 age-of-acquisition 254, 378, 441, 445, 452, 464 agrammatism 62, 63, 412 Aguilar, M. 54, 55 airflow measurement techniques 29 airway interruption method 28 Aksu-Koc, A. 99 alien-learner paradigm 397 Allen, K.B. 291 alpha error level 346

Altmann, G.T.M. 270, 430, 443 Alzheimer’s disease (AD) 417 ambiguous sentence 53, 59, 255–257, 397, 425, 427, 429 American Psychological Association 316, 347 American Sign Language 188, 190 analysis of variance (ANOVA) 276, 315, 434 Anand, P. 459 anaphora resolution 256 anaphors 236, 242, 256, 274, 335, 339, 340 Andreetta, S. 414 Andrés-Roqueta, C. 100 Andrews, S. 262 anomaly detection paradigm 60 antecedents 59, 72, 127, 141, 236, 237, 242, 256, 274, 334–340 anticipatory eye movements 269, 276, 278, 443, 448 antilocality effects 55 Antwerp Dutch (AD) 110 aphasia 9, 10, 57, 59, 62–63, 300, 408, 411, 413–417 a posteriori control 352–353 Appadurai, A. 459 appropriate argument schemes 143 Approximate Number System 77 a priori control 350–352 arcuate fasciculus 303 areas of interest (AOI) 165, 219, 253, 291, 293 argumentation: classification 139–140; contributions and research 148–150; critical discussions 139; critical issues and topics 140–148; definition 139; dialogues 139; embodiment

486

Index phenomena 2; evaluation 148–149; gaze behaviour 293; ignorance 146; improvement 150; intervention research 140, 147; modus ponens 141; norms/rules 141; practice recommendation 151–152; production 148; research methods 150–151; social and cognitive skills 139; structure complexity hypothesis 63 Argyle, M. 284, 287 armchair/introspection-based methods 73 Arnett, J.J. 459 Arnold, J.E. 126, 128, 194–196, 270 Arslan, S. 399 articulation: DLD 395; instrumental phonetic study 22; laryngeal level 29; motor planning and control 410; phonetic plan 377; sentences 407; speech sounds 8; supraglottic/supralaryngeal level 29 articulatory phonetics 8 articulatory phonology 31 articulatory score 410, 411 artificial intelligence (AI) 11–12, 143, 308, 309 artificial neural networks 12 The Art of Pronunciation 8 Astell, A. 286, 287 Asudeh, A. 336 asymmetric sampling in time (AST) 209 Athanasopoulos, P. 219 attitude elicitation 110 attraction effects 56, 393 atypical language development: align research with needs 400–401; cognitive assessment 392–393; contribution and research 394–395; critical issues and topics 391–393; disorders 390–391; false positive results 399–400; functional limitations 390; future directions 401; historical perspectives 391; learning continuum 400; multiple sources of information 400; nonverbal impairments 390; see also developmental language disorder (DLD) audience design 191, 192 audio-visual latency issues 361 auditory-somatosensory neuronal networks 223–224 auditory stimuli 164, 189, 202, 214, 219, 277–278, 334, 338, 394, 430, 448 Auer, P. 290, 291 authenticity 108 autism spectrum disorder (ASD) 87, 92, 307, 391 automated argument mining 151 automaticity 224, 398, 446–447 Baddeley, A. 399 Barr, D.J. 434 Bartlett, S.F.C 121 Bayes factor 325, 326

Bayesian estimation: advantages 326; Bayes factor 325, 326; concrete example 321–326; free online 326; frequentist vs. 323, 324; intuition-based judgments 320; LKJ prior 323; maximum likelihood 319; posterior distribution 320, 321; prior distribution 319; sensitivity analysis 325; uncertainty quantification 318–326; uninformative prior 326 Bayesian methods 314, 318, 323, 325 Bayesian pragmatic theories 78 Bayes’ rule 144, 320 Beddor, B. 80 behavioural genetics 309 behavioural methods 165, 176–177, 202–204, 206, 211–213, 449 behavioural task 202, 208, 211–213, 301, 400, 478 behaviourism 11 Berger, H. 15 beta error level 346 Betancourt, M. 322 Beyersmann, E. 46 biculturalism 453 bilingualism: context acknowledging 453; contributions and research 446–448; critical issues and topics 444–445; cross-linguistic activation 445; C-test performance 349; DLD 392, 393, 398; domain-general cognitive control 442; eye movements 262; future directions 453–454; historical perspectives 442–444; human evolution and migration 440; immersion experience impacts 307; L2 learners 59; language-cognition interface 441; language non-selective activation 441; language processing 441; mass migration and mobility 440; measurements 444–445; neural and sensorimotor system 440; neural basis 441; nonoverlapping condition 189; parallel activation 446–447; populations, experimental studies 461–463; practice recommendations 452–453; prediction 447–448; proficiency and usage 452–453; research methods 448–452; translations 440; word production 188 Bilingual Language Interaction Network for Comprehension of Speech (BLINCS) 442 binding theory 335, 429 blank screen paradigm 276, 449–450 blood-oxygen-level-dependent (BOLD) 15, 62, 206, 211, 303, 304 Bloomfield, L. 11 Blumenfeld, H.K., 443, 446 Boas, F. 459 Bobb, S.C. 427 Bock, J.K. 189, 190

487

Index Bock, K. 56 Boers, F. 258 Boroditsky, L. 227, 229 Bosisio, N. 480 Boston Naming Test (BNT) 411 Bott, L. 75, 93 Boudelaa, S., 45 boundary paradigm 253, 261 Bowern, C. 161 Box-Cox procedure 433 Bozic, M., 44 brain activation 2, 39, 41, 211, 213, 301 brain damage 9–10, 300, 303 brain imaging 2, 15; cognitive mechanisms 299; contributions and research 301–302; coregistering signals 278; critical issues and topics 301; definition 299; future directions 308–309; historical perspectives 300–301; lesion analysis methods 299; neuroimaging methods 299; practice recommendations 307–308; research methods 302–307; sociolinguistic research 111 Branigan, H.P. 64 Breheny, R. 93 Brennan, S.E. 192 Broca, P. 9, 10 Broca’s area 10, 61–63, 300, 303 Brodmann’s area (BA) 97 Brône, G. 287, 291 Brookshire, R.H. 413 Brown-Schmidt, S. 126, 197, 274, 275 Brücke, E.W. 8 Brun, G. 80 Bryan, V.C. 480 Bubbico, G. 475 Buhler, K. 11 Bürkner, P.-C. 326 Buswell, G.T. 251 Calderón, E. 461, 465 Callemein, T. 293, 294 Campbell-Kibler, K. 108–110 canonicity effect 62 Caramazza, A. 63 cardinal numerals 74–75 card-ordering task 193 Carlin, J.B. 316 Carpenter, P.A 60 Carreiras, M. 53 categorisation preferences 217, 224–225, 229 Cebrian, J. 206, 208, 213 Chelliah, S.L. 161 Chemla, E. 75 Chien, Y.-C. 336 child language: acquisition 129, 157, 422, 425, 459; cases 375; elicited imitation and

production 376–378; linguistic domains 375; observational study 11; phonological and phonetics production studies 375–379; practice recommendations 385–386; speech production tasks 378–379; syntax and semantics 379–385 Chomsky, N. 52, 106, 332, 422, 423 Chondrogianni, V. 398 Christopher, G. 473, 474 Clahsen, H. 46, 432 Claidiere, N. 459 Clark, H.H. 12, 13, 193, 194, 271 Clark, L. 113, 114 classification tasks 124, 128 click location 12 Clifton, C. Jr. 53, 255, 256 clinical neurology 301 cloze test 124 clustered-sparse imaging method 206 cognitive linguistics 2, 31, 442 cognitive load 60, 357, 358, 364, 469 cognitive measures 478 cognitive neuroscience 173 cognitive psychology 173, 235, 255, 285, 444 cognitive revolution 11–12, 442 cognitive sciences 1, 12, 17, 115, 140, 145, 157, 441, 442, 453, 460 coherence 92, 120, 121, 123, 124, 129, 237–238, 245, 274 coherence threshold 239, 241–244 Collins, A.M. 12 common ground context 192 communication 24–26, 85, 92, 187, 192, 194, 274–275, 284, 286, 299, 368, 411, 413, 414, 440 communion 142 competence/performance distinction 52 competitor 189, 191–193, 269, 272, 277, 382, 385 complement anaphora 72, 73 complex adaptive system (CAS) 460 compliant responders 164 comprehension questions (CQs) 129, 132, 164, 242, 259, 276, 427, 433, 465 computational modelling 41, 46, 78, 82, 324 computed tomography (CT) 300–301 computer technology 426–428 Comrie, B. 463 conceptualization 193, 229, 239, 377, 409 conceptual pact 192 conduction aphasia 63 confounded variables 346 connectivity constraint 196 construction-integration (CI) model 238 content–prosody incongruity effect 97 content units (CUs) 414 content words 188, 235, 253, 414

488

Index context-based pragmatic inferences 87 context–content incongruity effect 97 contextualism/relativism debate 80 context-users 196 context vs. general world knowledge 239–241 continuation tasks 126, 129 Conti-Ramsden, G. 399 control group matching: age and language matching 393; neurolinguistic experiments 393 conventional validity 145, 148 conversational implicature 85–88 conversation analysis (CA) 284, 285, 290, 413 Cook, A.E. 241–243, 245 Cook, M. 284, 287 Cooper, R.M. 14, 270 co-registering brain imaging signals 278 corner/corn effect 42 corpus-based word embeddings 217, 228 corpus linguistics 1, 31, 290, 291 correct information units (CIUs) 414 correct rejections/misses 225–226 Costa, A. 350 Courtney, E.H. 99 Cozijn, R. 127 Crain, S. 380, 397, 399 credible interval 324 Creer, S.D. 239, 240 critical debate 140 Crocker, M.W. 274, 276 cross-linguistic activation 445–447, 451 cross-linguistic differences 44–45, 426 cross-linguistic psycholinguistics: contribution and research 158–160; critical issues and topics 157–158; experimental tasks and measurements 164–166; future directions 166; historical perspectives 156–157; participants 163–164; research methods and practice recommendations 160–166; sentence processing research 156; stimulus materials 161–163 cross-linguistic research 59, 62, 64, 158, 160, 165, 459 cross-task correlational analysis 208 crowdsourcing 121, 127, 132, 229 Csibra, G. 91 C-test performance 349, 353 cue-based models 57 Cuetos, F. 53 cultural differences: bilingual populations 461–463; build your experiment 467; challenges 468–469; colour 461; contributions and research 460–464; critical issues and topics 459–460; data analysis 468; data collection 468; definition 458; experimental design 467; experimentation 458; eye gaze 291; field experiments 458; future directions 468–469; historical perspectives 459;

identify research question 466; interpret your results 468; linguistic phenomena 463–464; observation 458; practice recommendations 466–468; prepare your stimuli 467; quantity 460–461; research methods 464–466; research question and predictions 466; select your participants 467–468; social organization 458; space 461 cultural representations 459 Cunnings, I., 338, 339 curves of speech 22 custom-written programme 13 Cutler, A. 446 Dahan, D. 465 Dal Maso, S. 45 data sharing 32 DataViewer 276 Davis, M.H. 43 Davis, P.A. 15 Dax, M. 9 De Beugher, S. 293 De Carvalho, A. 94 decision latencies 39 declarative/procedural (DP) model 45 deficit methodology 300 Defranza, D. 228 Degen, J. 96 Degutyte, Z. 286, 287 De la Fuente, I. 128 Delbrück, B. 11 Deliens, G. 100 Del Pinal, G. 80 demographics 347, 353 Demolin, D. 22 denial of antecedent 141 dependency locality theory 54 dependent variable 13, 30, 31, 130, 150–152, 257, 276, 345, 416, 464 derivational theory of complexity (DTC) 16, 52, 53 de-selection procedure 352 determiner phrase (DP) 53 Deutsch, W. 191 developmental dyslexia (DD) 391 developmental language disorder (DLD): agematched peers 393; cross-linguistic studies 394–395; definition 390; diagnostic criteria 410; etiology 391; linguistic and acquisition research 391; multilingual settings 392; research methods 395–399; sentence repetition task 392; syntax-prosody interface 25 De Vries, C. 291 dialogue 16, 73, 139, 140, 150, 283, 291 Diependaele, K. 432 differentiating condition 390–391

489

Index diffusion magnetic resonance imaging (dMRI) 302–303 diffusion tensor imaging (DTI) 302 digital platforms 121 digital recorder 30 Dillon, B. 338 direct assessment 287 directive/act-out tasks 129–130, 380 disagreement space 142 discourse: analysis 16, 31, 407, 413–415; classification and labelling tasks 128–129; coherence 120, 121; comprehension questions 129; context 194–197; continuation tasks 129; contributions and research 127–128; critical issues and topics 122–127; dialogue 16; directive and actout tasks 129–130; ERP/EEG 130; eyetracking-while-reading 130; fMRI 130; forced choice interpretation task 130–131; future directions 133; historical perspectives 121–122; image description 131; insertion 131; judgment tasks 131; overestimation 120; planning and organization 409; practice recommendations 132–133; processing 127, 274; recall questions 131–132; referential coherence 125–127; relational coherence 122–125; research methods 128–132; self-paced reading 132; skills 417; summarization and free recall 132; visual world paradigm 130 discrimination tasks 203–204, 208, 211–214 disfluency 25–26, 197 disruption 54, 240, 243, 257, 262 distributed language processing 180 divergence-point analysis 277 divergence threshold 243 Dodge, R. 14 Dolscheid, S. 226 Dorner, H. 480 dorso-medial prefrontal cortex (DLPFC) 97 downward- vs. upward-entailing contexts 75 Dromi, E. 395 drop-out 364, 365, 368, 382 dual-task methodology 447 Duay, D.L. 480 Duffy, S.A. 256 Duncan, S. 284 Dussias, P.E. 427 Dutton, S.O. 242 dyslexia 263 early left anterior negativity (ELAN) 61 early pragmatic development/disorder 88, 91–92 ease of access 364, 451 eavesdropping 79

ecological validity 23, 133, 229, 357–358, 360, 362, 364, 367, 411, 428 Egan, A. 80 Egurtzegi, A. 159 El Aissati, A. 460 electrical neuroimaging 205 electroencephalography (EEG) 15, 27, 46, 60, 98, 130, 177, 205–206, 304–307, 394 electroglottograph (EGG) 28 electromyographic response 178 electrophysiological non-invasive technique 60 electrophysiology 41, 46, 60–61, 304–307, 306 elicitation tasks 190, 396 elicited imitation 376–378, 385 elicited production 376–379, 385 Elmer, S. 206, 209, 211 embodied/distributed language 174, 176 emotion recognition models 294 empathy 142 enactivism 442 Engelmann, F. 320 epistemic tension 80 epistemic vigilance development 98–100 equipment-related factors 358, 360, 366–367 ergative case marking 158–159 ERP components 15, 98, 123, 221–222 error analysis 423 Evans, N. 460 event-related potentials (ERPs) 15, 41, 60, 98, 111, 123, 127, 130, 205, 210, 217, 221–222, 393, 426 expectation-based models 57 experimental controls 30, 108, 229, 356, 358, 360, 361, 367 experimental linguistics: brain damage 9–10; cognitive revolution 11–12; cultural contexts 3; definition 2; domains 2; elicitation and judgment tasks 2; eye movements, reading 10; interdisciplinarity 31; language 9–11; methodological complexity 3; new experimental methods 12–16; participant variability 345–354; principle 3; replication and reproducibility 3; speech sounds 8–9; statistical analysis 313–326; syntax and semantics 16; writing systems 7 experimental morphology: contributions and research 46–47; critical issues and topics 42–46; decomposition approach 38; definitions 38; dual-route models 38; full-listing models 38; future directions 47; historical perspectives 41–42; mental lexicon 38; practice recommendations 47; psycholinguistic techniques 39; research methods 39–41

490

Index experimental paradigms 55, 59, 99, 107, 112, 205, 222, 261, 360–362, 441, 467 experimental phonetics/phonology: contributions and research 23–26; corpora and stylistic diversity 23; definition 21; disfluency 25–26; historical perspectives 22; hypothesis testing 21; instrumental acoustic analysis 21; invasiveness 27; language typology and description 24; multimodality 23; pathological speech 25; perspectives 31–32; practice recommendations 29–31; prosody 25; recording equipment 30; recruiting, participants 30–31; research methods 26–29; segments to discourse 25–26; speakers’ selection 30; statistical analysis 31; stimuli presentation 31; testable hypotheses 29–30; typology to variation 23–25; variability and complexity 22–23; variation 24 experimental pragmatics: assumptions 85; contribution and research 93–99; critical issues and topics 87–92; future directions 100–101; historical perspectives 86–87; inferential model, communication 85; mind development 86; mind-reading ability 86; neo-Gricean and post-Gricean approaches 85–86; relevance theory 86; scalar implicature 275; VWP 274 experimental semantics: acquisition 72; contributions and research 74–80; critical issues and topics 73–74; definition 71; future directions 82; historical perspectives 72–73; inferential statistics 71; practice recommendations 81–82; pragmatics 72; research methods 80–81; traditional/ introspection-based methods 71 experimental sociolinguistics see sociolinguistics experimental syntax: challenging 51; critical issues and contributions 53–58; future directions 63–64; historical perspectives 52–53; locality 54–58; offline tasks 51; online measures 51; psycho- and neurolinguistics research 51; research methods 58–63; universality and economy 53–54; working memory and control mechanisms 51 experimenter effects 362, 364, 365 explicit linking theories 79 external validity 108, 346–348, 367 eye-fixations 229 eye gaze interaction 284–287, 292 eye movements: behaviours 428; beyond syntax 256–257; differential movement 14; experimental paradigms 261, 261–262; free-head desk-mounted systems 14; gazecontingent techniques 14; growth-curve

analysis 277; lexical-semantic influences 252–254; measures 217, 219–220; morphological influences 254–255; orthographic neighbourhood effects 254; photographic methods 14; psycholinguistic research 13; reading research 10, 257–259, 263n1; recording systems 251; referential devices 274; reflective properties 13; sentence and discourse influences 255–256; sentence processing tasks 59; unbalanced bilinguals 262; utterance planning 196; visual workspace 94 eye-tracking 2, 3, 27, 59, 127, 159; challenging 51; cognitive processing 250; contributions and research 252–257; critical issues and concepts 251–252; data acquisition methods 222; dependent variables 13; experiment 94–96; future directions 262–263; historical overview 250–251; line-by-line technique 245; memory and expectancy effects 57; methods 123; picture description 190; practicalities 382; practice recommendations 259–262; predictive processing 428; psycholinguistic methods 250; reading 259–260, 428–429; regressions 122; research methods 257–259; self-paced reading 250; spoken language 268–278; syntactic and semantic ambiguities 383; target/distractor pictures 41; variants 40; visual world paradigm 130, 429–430; See also eye movements eye-tracking multimodal interaction analysis: contributions 290–292; example 283; future directions 293–294; gaze cursor 284; inquires 283; mobile eye-tracking systems 284; practice recommendations 292–293; research methods 286–288; research strands and traditions 284–286; semiotic resources 284 eye-tracking-while-reading 127, 130 face-to-face, interactive conversation 275 Fall-Tachistoskop 11 familiarity 218, 227, 254, 258, 277, 314, 352, 358, 378, 433, 466, 469 fantasy-based contextual information 240 Feiman, R. 75, 76 Feldman, L.B. 43 Felser, C. 336, 338, 339 Fermi problem 320 Ferreira, V.S. 191 filler-gap dependency see long-distance dependencies (LDD) fine-grained analysis 245 first-fixation duration 219, 258, 260 first-pass reading time 219, 254, 259, 260

491

Index fixation location 219, 260 fixations 10, 59, 96, 122, 219, 245, 251–262, 269, 274, 276–278, 291, 428, 442, 444, 449 fixation time 253, 444 Flecken, M. 220 fluency tasks 411–412 forced-choice identification 202 forced choice interpretation task 129–131 forced choice paradigm 126, 131 forced choice version 125 formal assessment 480 Forster, K.I. 40, 42, 46 Foucart, A. 111 Francken, J.C. 224 Frank, M.C. 460 Frazier, L. 59, 255, 256 Freedom Rule 142, 147, 149 free-head desk-mounted systems 14 free recall 121, 123, 132 Freeth, M. 291 frequentist data analysis: data set 322; highinterference condition 314; hypothesis tests 315–317; informative 315–316; low-interference condition 314; p-values, confidence intervals 317–318, 319; selfpaced reading 314, 315; t-tests 315; underpowered studies 316–317 frequentist vs. Bayesian estimation 323, 324 Friederici, A. 61 Fukumura, K. 125, 126 functional connectivity 475 functional magnetic resonance imaging (fMRI) 15, 27, 41, 46, 61, 97, 127, 130, 177, 206, 211–213, 217, 223–224, 303–304, 426 functional measures: cognitive/behavioural task 301; EEG and MEG 304–307; fMRI 303–304; fNIRS 307; language-related brain areas 304, 305 functional near-infrared spectroscopy (fNIRS) 222, 301, 307 function words 188, 253, 395 Gabriel, U. 347–351 Gagné, C.L. 44 Gallese, V. 174–176 galvanic skin response (GSR) 294, 360 Gambi, C. 428 gangster lifestyle 112 Garcia, R. 459, 460 garden paths (GP) 51, 427 garden-path theory 255 Garnham, A. 13 Garrido Rodriguez, G. 162 Garrod, S. 13 gaze contingent boundary paradigms (GCBP) 261–262

gaze contingent moving window paradigm 261 gaze-contingent techniques 14 gaze duration 252, 254, 258 gaze machinery interaction 284 gaze registration 286, 287 Gehrer, N.A. 291 Gelman, A. 314, 316 gender prejudice 228 general brain network 223 generalisability 3, 349, 350, 354, 358, 362–363, 365 generalised conversational implicature 88 generalized additive mixed model (GAMM) 277 generalized quantifier theory (GQT) 77 generation tasks 412 gerontology 477, 479 Gerrits, E. 204 Gibson, E. 461 Gilbert, A.L. 218 Giraudo, H. 45 GithHub 308 Glenberg, A.M. 174–176 good old-fashioned artificial intelligence (GOFAI) 12 go-past time 258, 259 Gordon, P. 335 gradable adjectives 78–79 Graf, C. 191 Grainger, J. 46, 450 grammatical gap 398 grammaticality judgement 58, 330–335, 338, 341, 397, 427 grammaticality vs. acceptability 331 grammatical vs. ungrammatical participants 331, 335 gravity chronometer 11 Green, D.W. 13 Grice, H. 85, 88, 101 Griffin, Z.M. 59, 190, 196 Grillo, N. 54, 55 Grodner, D.J. 96 Grondelaers, S. 108 grounding process 194 growth-curve analysis 277, 449 Günther, F. 45 Gygax, P.M. 127, 347–351 Haataja, E. 291 Haberlandt, K. 122 haemoglobin oxygenation 165, 223 Haendler, Y. 464 Haensel, J.X. 291 Hahn, U. 143 Hall, M.L. 190 Hammer, A. 127 Hancock, J.T. 90 Hanna, J.E. 192

492

Index Hartridge, H. 14 Hartsuiker, R. 111 Hauk, O. 174, 175, 177 Haviland, S.E. 12, 13 Heine, S.J., 350 Heisey, D.M. 316 Hellwig, B. 165 hemodynamic methods 61–62 Hendrick, R. 335 Hendriks, P. 128 Henrich, J. 157, 350 Hessels, R.S. 293 Heyer, V. 43, 46 Hick’s law 230n1, 354n1 Hick, W.E. 230n1 Hillyard, S.A. 15 Hilton, K. 112, 113 Hintz, F. 352 Hoenig, J.M., 316 Hoffman, L. 399 Hofmeister, P. 58 homogenising 347, 349 Hood, A. 408 Hopp, H. 46, 429 Hornikx, J. 143 Horton, W.S. 192 Huang, Y.T. 75, 95, 96, 96 Huettig, F. 443 Hyönä, J. 255 identification tasks 30, 31, 202–203, 207–208, 212–213 image description 131 implicature 17, 73, 75, 76, 85, 86, 88, 89 Implicit Association Test (IAT) 109, 110 implicit bias 115 independent variable 30, 31, 150, 151, 276, 277, 345, 416, 464, 465 individual differences 47, 127, 128, 133, 149–152, 226, 263, 352, 353, 446, 447, 449 inference-making 12 inference questions 124, 129 inferences 75, 76, 87, 100, 107, 141, 166, 236–237, 274, 275, 318, 324 inferior frontal gyrus (IFG) 9, 97, 178 information-processing approach 187 information status 194, 195, 197 insertion paradigm 125, 131 instrument accuracy/precision 359 instrument precision 359, 361 integration effect 122, 123 intellectual disability 391, 392 interactive tasks 192, 193 interference effect 57, 315, 324, 325, 326 interlanguage competence 330 internal validity 108, 345–348, 350, 354

International Phonetic Alphabet (IPA) 28, 203 interpretation results 278 interstimulus interval (ISI) 204, 208, 210, 211, 214 intonation 21, 23, 24, 30, 112, 213, 271, 275, 394 intraoral air pressure measurements 28 intuition-based judgments 320 invasiveness 27 inverse problem 205 ironical utterances 87, 90–92, 97, 98, 100 irony comprehension: EEG 98; fMRI 97; mental state reasoning 91; mind-reading abilities 88; neuropsychological approaches 97–98; pragmatic accounts 90 island effects 55–56 Ito, A. 448 Jackson, C.N. 427 Jacobson, P. 16, 73, 74 Jakobs, B.J. 417 Jakubowicz, C. 398 Jegerski, J. 428 Jeong, S. 112, 113 Jiang, N. 426 Johnson, K. 110, 111 joint action 190 joint attention 91, 92, 285, 286, 290 Jongerius, C. 286, 287, 294 judgment tasks 2, 131, 227, 336, 351, 424, 464–465 Juffs, A. 427, 433 Just, M.A. 60, 426 Kacetl, J. 480 Kahn, J. 195 Kamide, Y. 270, 430, 443 Kapiley, K. 453 Kaschak, M.P. 174–176 Katsos, N. 100 Keenan, E.L. 463 Keller, F. 336 Kendeou, P. 243, 245 Kendon, A. 284, 285, 287 Kersten, A.W. 226 Kesselheim, W. 290 Keysar, B. 192, 271 keyword sorting tasks 124 Kidd, E. 459, 460 Kim, E. 340, 342n4 Kim, J.D. 367 Kintsch, W. 235–238 Kleijn, S. 124 Klímová, B. 480 Knobe, J. 79, 80 Koenig, J.-P. 128 Köhne-Fuetterer, J. 123 Koizumi, M. 165

493

Index Konieczny, L. 55 Koornneef, A. 127, 340 Kootstra, G.J. 462 Kornishova, D. 43 Kratzer, A. 80 Kristiansen, T. 108 Kruschke, J. 326 Krych, M.A. 193, 194 Kuhn, D. 146 Kuperman, V. 262 Kurthen, I. 206, 209 Kutas, M. 15 L2 lexical stress perception: discrimination task, fMRI scanner 211–212; fMRI and behavioural data 213; identification task 212–213; methodological take-home message 213; neuroimaging and behavioural methods 211–213; stress patterns 211; stress vs. vowel conditions 212 L2 vowels perception: behavioural methods 206; cross-task correlational analysis 208; discrimination task 208; identification task 207–208; methodological take-home message 208; perceptual assimilation task 207 lab-based experimentation: benefits 359–361, 360; drawbacks 359, 360, 362–363; experimental control 367; factors 370; web-based advertisements 365 labelled category fluency tasks see semantic fluency tasks labioscope movements 8 laboratory phonology 22, 23, 275 labour-intensive manual analyses 151 Labov, W. 22, 107, 108 lab-/web-based testing: computer revolution 356; core concepts 357–359; flexibility 369; least resistance 356; misconceptions 357; video conference technology 356; see also lab-based experimentation; web-based experimentation Ladouce, S. 294 Lamare, M. 10 Lambert, W.E. 108 Lam, T.Q. 195 Laner, B. 290 Langlois, V.J. 128 language: acquisition 1, 10, 17, 340, 390; brain damage 9–10; comprehension (see language comprehension); context 112–113; description 24; early psychological research 10–11; impairment 392, 396; matching 393; proficiency measures 479; switching costs 462; teaching methods 476–477; typology 1, 24

language-as-action 271 language-as-fixed-effect fallacy 12, 434 language comprehension: critical issues and concepts 173–175; historical perspective 173; language studied type 175; limitations 180; neuroimaging methods 177–178; practice recommendations 179–180; vs. production tasks 397–398; questions addressed type 173–174; research methods 176–179; transcranial magnetic stimulation 178–179, 179 Language Impairment Testing in Multilingual Settings (LITMUS) 392 language production: acquired brain disorders 408–409; contributions and research 410–415; critical issues and topics 408–410; discourse analysis 407; future directions 417; historical perspectives 407–408; microlinguistic and macrolinguistic processes 407; neural network 407; overview 409–410; phonological and phonetic representations 375; practice recommendations 416–417; research methods 415–416 language-situation matching tasks 80 laryngeal articulatory studies 29 Lassiter’s threshold-based theory 80 late-life language learning: cognitive measures 478; contributions and research 475–477; critical issues and topics 474–475; definition 473; future directions 481; historical perspectives 474; language proficiency measures 479; mood disorders 473; practical recommendations 479–481; psychosocial measures 479; research methods 477–479; socio-affective and linguistic outcomes 473; study design 477–478; well-being outcomes 475–476 Latin square designs 315, 432–433 Lauterbur, P.C. 15 Leclercq, A.-L. 396 Leekam, S.R. 91 Lee, M.D. 325 Leeser, M. 427 left anterior negativity (LAN) 61 legitimate empirical methods 7 Leminen, A. 41, 46 lemmas 409, 410 Lemmerth, N. 46 Leonard, L.B. 395 levels effect 121 Levelt, W. 376 Levinson, S.C. 16, 460 Levon, E. 108, 113, 114 Lewis, S.S. 12, 342n1 lexical access 214, 258, 392, 398, 410, 411, 431, 462

494

Index lexical ambiguity 254 lexical concept 13, 409–411 lexical decision 39, 44, 94, 110, 225, 244, 257, 431, 450 lexical information units (LIUs) 414 lexical priming 94 lexical selection 409, 411, 414, 415 lexical-semantic influences: big three 254; frequency 252–253; length 253; predictability 253–254 lexical triggers 89, 94, 96 lexical vs. syntactic effects 256 lexomes 46 Libben, G 44 Li, C. 188 Lidz, J. 77, 426 life expectancy 473 likelihood ratio 144, 325 Likert scale 331, 424, 433 linearization 62 linear regressions 276, 434 linear strategy 339 linguistic competence: abstraction 330; contributions and research 333–334; critical issues and topics 332–333; definition 330; discourse 330; future directions 341–342; historical perspectives 331–332; language learning 330; offline and online testing 335; practice recommendations 340; research methods 334–340 linguistic-/language-modulated knowledge 226 linguistic-pragmatic inferences 100 linguistics: categorisation 2; competence (see linguistic competence); cultural biases 459; cultural differences 458–469; definition 7; embodiment phenomena 2; empirical methods 1; individual and social factors 2; intuitions 7, 11, 16; language processing 1, 2; outcomes 475; psychological sciences 1; relativity 217, 230, 460; social interactions 1; stereotypes 111; see also experimental linguistics listening studies 26, 26, 270 literal interpretation 87, 89, 96 lme4 syntax 315, 320, 321, 324, 434 Locatelli, M. 177 logical fallacy 141, 146 logographic languages 251, 253 logophors 339, 340 long-distance dependencies (LDD) 52, 56–58 Long, M.R. 475 Lonigan, C.J., 399 lower paradigm pre-testing requirements 361 Lucas, A.J. 99 Ludwig, C.F.W. 8 Luzzatti, C. 255

machine-learning algorithms 12, 308, 309 Mack, J.E. 412 macrolinguistic processes 407–409, 414–416 Macwhinney, B. 430, 431 Magistro, G. 30 magnetic resonance imaging (MRI) 29, 299, 302, 303 magnetoencephalography (MEG) 15, 27, 46, 60, 177, 304–307 Marangolo, P. 417 Marelli, M. 255 Marian, V. 443, 446 Marini, A. 414, 415, 417 Markov Chain Monte Carlo (MCMC) sampling 321 Markov models 277 Marslen-Wilson, W.D. 43–45 Martin, J. 80 Martin, K.I. 433 Marty, P. 75, 76 masked priming 39, 42, 43, 45, 47, 48n1, 430–432, 431, 450 matched-guise paradigm 107–109, 112 Matsui, T. 97, 98, 99 Mayer, J. 411 Mazzon, G. 417 McConkie, G. 253 McDonough, J. 161 McElreath, R. 326 McGregor, K.K. 399 McKee, C. 380 McKinnon, R. 55 McKoon, G. 237 McQueen, J.M. 443 Meakins, F. 461 mean length of utterance (MLU) 379, 393 medial prefrontal cortex (MPFC) 97 Meehl, P.E. 326 memory-based accounts 57 memory-based forced-choice tasks 225 memory-based perspective 56, 243–244 memory-overload-dependent effects 51 mental lexicon 38, 39, 42, 44, 225, 409–411, 450 mental state reasoning 91, 92 meta-level terminology 125 metalinguistic knowledge 332 metalinguistic tasks 81 Meyer, B.J.F. 121 Meyer, M.C. 75, 76 microcomputers 13–15 Micro Experimental Laboratory (MEL) 13 microlinguistic processes 407, 414, 415 micro-saccades 358 Milburn, T.F. 399 mild cognitive impairment (MCI) 417, 477 Miller, C.A. 56, 59 Miller, T. 225

495

Index mind network theory 197 mind-reading ability 86, 87, 89–91, 97, 100 minicomputers 13, 14 minimal attachment strategy 53, 255 Mini Mental State Examination (MMSE) 475 mirror-imaging 333 Mishra, R.K. 446, 453 Mitchell, D.C. 13, 53 Mitsugi, S. 430, 431 mixed-effects models (MEMs) 434 mixing cost 451 mobile eye-tracking systems 284, 290–292 modal expressions 79–80 modal/non-modal version 79 modal qualifications 142 modified matched-guise approach 108, 109, 112 modified tachistoscope 12 Momma, S. 64 Montgomery, C. 114 Montreal Cognitive Assessment test (MoCA) 478 mood disorders 473 Moore, E. 114 morphological encoding phase 410 morphosyntactic abilities 391 morphosyntactic domain 59 motivational confounding 364–365 motor-evoked potential (MEP) 178 movement-based response times 176 movement islands 55 moving mask paradigm 261 moving window technique 427 Muhonen, H. 291 Mulak, K.E. 469 multiculturalism 453 multifocal eye-tracking paradigm 284 multilingualism 391–392, 440, 453, 474 multimodal imaging 308 multiple submissions 368 multipliers 163 multiword expressions 254, 260 Murray, L. 411 muscular activity 10 Musolino, J. 75, 426 mutual gaze 284, 286, 291

neighbourhood 272 Neubauer, K. 432 neural activation 197, 211, 213 neural networks 309, 407 neural substantiations 64 neurocognition 475 neuroimaging 41, 46, 62, 64, 165, 177–178, 299, 302, 308, 394, 400, 417, 475, 478 neurolinguistics 7, 41, 46, 53, 57, 393 neurological disorders 300 neurological systems 174 neuromodulatory techniques 417 neuromuscular disorders 29 neurophysiological methods 165, 204–206 neuropsychological data 62–63, 478 neuroscience 177, 300 neutral interfering stimulus 189 Nicaraguan Sign Language 188 Nicenboim, B. 314, 315, 320, 326 Nicholas, L.E. 413 Niedzielski, N. 111 Nielsen, M. 459 Nieuwland, M.S. 127 nominal tense 463–464 nominative-accusative alignment 158, 159 non-cardinality 77 noncommunicative task 197 non-contact methods 10 non-grammatical strategy 338 non-invasive brain imaging techniques 222 non-invasive method 28 non-literal interpretation 87 non-native discourse processing 128 non-parametric bootstrapping approach 277 non-participants detection 364–365 non-selective parallel language activation 441 non-verbal clues 85 non-verbal communication 87 Norenzayan, A. 350 noun phrases (NPs) 156, 427 Noveck, I.A. 72, 93 Nozari, N. 196 nuclear magnetic resonance (NMR) 15 null pronouns/argument drop 156, 157

Nadig, A.S. 91, 191, 197 naive discriminative learning (NDL) 46 Nakamura, T. 97 naming latency 441, 451 naming tasks 411, 441, 466 Næss, A. 164 native vs. non- native morphological processing 45–46 Natural Language Processing (NLP) 228, 309 Navarro-Torres, C.A. 464 near infra-red spectroscopy (NIRS) 15, 165

Oben, B. 287, 291 object relative clauses (OR) 55, 57 Obligation-to-Defend Rule 142 O’Brien, E.J. 243 observation bias 362 oesophageal balloon 28 offline measures: accuracy scores 225–226; categorisation preferences 224–225; corpusbased word embeddings 228; definition 217; outside the lab 228–230; similarity ratings 227

496

Index Ohala, J.J. 21, 22 Olson, D.J. 435n2 online measures: definition 217; ERP 221–222; eye movement measures 219–220; fMRI 223–224; response times 218–219; SCR 220–221 online processing 12–13 opaque 43–45, 255 OpenFace framework 293 openness of research processes 366 open science 32, 133, 308, 309 Open Science Framework (OSF) 308, 466 organisational principle 362 oriental/semitic languages 64 orthographic neighbourhood effects 254 oscillations 307 Oshita, H. 341 Osterhout, L. 55 oxygenated blood flows 15 Özçelik, O. 426 Packer, M.J. 435n1 Panizza, D. 75 Papafragou, A. 75 parallel activation: automaticity 446–447; languages 446 parallel distributed processing (PDP) 42 pars opercularis 9 pars triangularis 9 participant drop-out 368 participant-researcher interactions 368–369 participant variability: definition 345; extraneous variance 345; internal and external validity 346–348; momentary states and enduring personal characteristics 347; a posteriori control 352–353; a priori control 350–352; reduction 348–350; reduction, systematic variation and registration 347; statistical power 346; strategies 354; summarization 353–354; systematic and non-systematic variation 345 participation framework 285 particularised conversational implicature 88 passive version 448 past tense debate 41–42 Paterson, K.B. 47, 254 pattern recognition 393 Pecher, D. 175 Pechmann, T. 191 people with aphasia (PWA) 62 perceptual assimilation tasks 203, 207 perceptual-motor systems 173 perceptual span 252 peripheral vision 293 Peterson, C.C. 86 Pfenninger, S.E. 475

p-hacking 133 Pharao, N. 112 Phillips, C. 64, 342n1 Phillips, W. 227 phonautograph recording 8, 9 phoneme monitoring 12 phonetics 22, 202 phonological encoding 410 Pickering, M.J. 64 picture-matching tasks 465 picture naming 451 picture task example 337 Picture-Word Interference task (PWI task) 189, 197 Pietroski, P. 77 plausibility 254, 429 Pollatsek, A. 254, 255 Polz, S. 475 positron emission tomography (PET) 15, 61, 177, 300 posterior distribution 320, 321 pragma-dialectics 142, 143, 145, 149 pragmatics: abilities 87; cardinal numerals 74; communication 274–275; contextdependent aspects 16; inferences 100, 101; irony comprehension 90; lexical adjustment 89; module 78; neo-Gricean approaches 89; presupposition and implicature 17; psycholinguistic methods 72; semantics 71; thresholds 79 Prasada, S. 42 Preferential Looking Paradigm 386n2 prefix-stripping procedure 42 pre-registration 133, 353 presentation modality 394 preview benefit effect 262 priming 39–40, 40, 190, 462–463, 465 privileged ground context 191 probability density function (PDF) 318 problem validity 145 procedural vs. declarative memory 41 productive regular vs. unproductive irregular inflection 43 productivity 43–44 prosody 25, 275 prosody-transplantation paradigm 27 protocol design 29 prototype theory 77, 78, 224 pseudo random assignment 351 pseudowords 225 psycholinguistics 1, 7, 13, 17, 39, 51, 52, 53, 57, 72, 121, 157, 164, 226, 313, 320, 424, 442, 459 psychological sciences 1 psychology of language 31 psychosocial measures 479 public control, ethical processes 366 Pulvermüller, F. 178

497

Index pupillary dilation responses 60 pupillometry 60 Purkinje images 14 Pygmalion 8 Pynte, J. 13 Pyykkönen-Klauck, P. 274, 276 qualitative observation 408 quality of recalled information 124 quantificational expressions 76–78 quantitative assessments 408 question answering 124 question under discussion (QUD) 80 Quillian, M.R. 12 radiofrequency pulses 206 radioisotopes 300 Rad, M.S. 459 Ramscar, M. 229 randomisation 346 Rastle, K. 42, 43, 45, 48n2, 48n3 Ratcliff, R. 237 rating tasks 204 Rayner, K. 41, 59, 250, 251, 253, 255, 256 reaction time 93, 94, 127, 230n1 reactivation mechanism 236 reactive briefing/debriefing 361 reading times (RTs) 93–94, 427 recall questions 131–132 recording equipment 30 referential coherence 120, 121; analysis 125–126; representation 126–127 referential communication task 191, 192, 197 referential processing 128 referring expressions (REs) 125–127 Regel, S. 98 region of practical equivalence approach 324 regions of interest (ROI) 258–260, 269, 429 register participant variability 352–353 regression fixation proportion 219 regression path duration see go-past time regressions 122, 242, 252, 428 Reinhart, T. 339 Reips, U.-D. 358, 363, 365, 366 relational coherence 120, 121; analysis, relations and markers 124–125; processing 122–123; representation 123–124 relation labeling tasks 124 relative clauses (RCs) 53, 463 relativized minimality approach 57 relevance theory 86, 89, 93, 94 repetitive transcranic magnetic stimulation (rTMS) 417 research-grade eye tracker 165 research participants 146 resonance model 236, 237

resource constraints 359, 366 respiratory inductance plethysmography bands 28 response accuracy 359 response-based behavioural paradigms 180 response key 13 response times 217, 218–219, 230n1, 354n1 retraction scenarios 79 Reuland, E. 339 de Reuse, W.J. 161 Reuter, K. 80 Ricciardi, G. 80 Rice, M.L. 399 RI-Val model 238–239, 239, 241–243 Rodríguez, G.A. 427 Roettger, T.B. 133 Rogers, S.L. 291 Rohde, A. 410 Romoli, J. 75 Rosa, E.C. 126 Ross, J.R. 55 Roulet-Amiot, L. 398 Rousselot, P.-J. 22 Rubenstein, H. 12 Rubenstein, M.A. 12 saccades 10, 122, 251, 252, 268 Şafak, D.F. 429 Sag, I. 58 Şahin, H. 462 Sanford, A. 13 Sansavini, A. 410, 413 Sapir-Whorf hypothesis 460 Sarvasy, H.S. 160 Sato, S. 219, 347–351, 353 Sauerland, U. 469 Sauppe, S. 159, 161, 163, 165 scalar implicature 88, 89; eye-tracking experiment 94–96; lexical priming 94; reaction time experiment 93; reading time experiment 93–94; time-course, trial types 96 scaling tasks see rating tasks Scarborough, H.S. 13 scene perception 251 Schachter, J. 424 Schad, D. J., 317, 322, 326 Schiller, E. 480 Schluroff, M. 60 Schmidt, T. 223 Schneider, W. 13 Scholman, M.C.J. 127 Schouten, M.E.H. 204 Schütze, C.T. 424 Schwab, S. 206, 211, 213 scientific developments 7 scientific reasoning 314 Scott de Martinville, É.-L. 8

498

Index scrambling 156, 157 second language acquisition (SLA): critical issues and topics 434–435; historical perspectives 422–424; practice recommendations 432–433; production data 423; quantitative approach 422; research methods 424–434; statistical analysis 433–434 Sedivy, J.C. 91, 191, 197, 382 selective reporting 133 self-confidence 476 self-paced listening 164 self-paced reading (SPR) 13, 58, 121, 123, 127, 132, 314, 315, 425–428, 427 self-selection bias 369 semantic fluency tasks 411 semantic integration 273 semantics/pragmatics interface 75–76 semantic transparency 255 semi-automatic annotation techniques 294 semi-experimental tasks 465–466 semi-structured responses 165 Senju, A. 91 sensorimotor processing 173 sensorimotor synchronisation 26 sentence completion 125, 126, 412 sentence formulation 189–190 sentence-picture acceptability task 76 sentence production models 56 sentence production priming tests 412 sentence repetition 396–397 sentential complement (SC) 429 Seuren, L.M. 362 shared gaze 286 Shen, X.R. 462 Sherratt, S. 414 signal detection theory 204 Signorello, R. 24 Silva, R. 432 similarity-based interference 55, 314, 315 similarity ratings 227 simple bootstrapping mechanism 46 simulations 174 single fixation duration 258 single-photon emission computed tomography (SPECT) 15 situation model 124, 235 skin conductance responses (SCR) 217, 220–221 Slabakova, R. 426 slippery slope arguments 146 SLIP technique 189 Smolka, E. 45, 48n3 Snedeker, J. 75, 95, 96, 96 Sneed, K. 291 social communication 92 social context 190–194 social desirability bias 109, 362

social organization 458 social participation 390 social-pragmatic inferences 100 social psychology 109, 115, 285 socio-affective 475 socio-cognitive abilities 90 sociolinguistics 1, 31; competence and performance 106; contributions and research 113–115; critical issues and topics 109–112; cultural ecology 106; embedding problem 106; evaluation problem 107; future directions 115; historical perspectives 107–109; innovations 106; language in context 112–113; language’s grammatical correctness 106; practice recommendations 115; social and interactional goals 106 Sonia, A.N. 243 sound recording machines 22 Space Game series 229 Spalding, T.L. 44 Spalletta, G.F. 417 sparse temporal sampling approach 206 speakers’ selection 30 specific language impairment (SLI) 390 speech elicitation techniques 107 speech perception 272–273; acoustic analyses 27–28; aerodynamics 28–29; articulation 29; auditory signal 201; behavioural methods 202–204; coarticulation 201; definition 201; exemplar studies 206–213; future directions 213–214; lack of invariance 201; listening studies 26, 26; neurophysiological methods 204–206; stimulus preparation 27; voice quality measures 28 speech production 27, 378–379, 466 speech sounds 8–9 speech synthesis 27 Speed, L. 160 Speed, L.J. 358 Sperber, D. 72, 98, 459 spillover sentence 246n1 Spivey, M. 446 Spivey, M.J. 176 spoken language 8; analysis 276–277; contributions and research 272–275; critical issues and topics 271–272; data and measurement 276; definition 268; future directions 278; historical perspectives 269–271; linking hypothesis 269; practice recommendations 277–278; referential domain 268; regions of interest 269; research methods 275–277; saccades 268; speech signal 268; task 275–276; video-based eye-tracker 269 spoken word recognition 272–273 spontaneous child speech 376

499

Index spontaneous linguistic productions: cognitive achievement 187; discourse context 194–197; future directions 197; language production 187; modalities 187; sentence formulation 189–190; social context 187, 190–194; spoken language 188; word production 188–189 spontaneous speech 395–396 Spotorno, N. 98 spreading activation 236 Sprouse, J. 58, 424 Standard Belgian Dutch (SBD) 110 Standard Southern British English (SSBE) 207 Stanners, R.F. 41 Stan programming language 323 statistical analysis 31; conventional frequentist data analysis 314–318; linguistics 313; misinterpretations 313; null hypothesis 314; reproducible code and data 326; SLA 433–434; uncertainty quantification, Bayesian estimation 318–326 statistical inference 313, 314 statistical power 346 Staub, A. 57, 255, 256 Stevenson, R.J. 126 stimulus onset asynchrony (SOA) 48n1 stimulus preparation 27 Stoehr, A. 378, 435n2 Stone, K. 277 storage space vs. timing 38 story continuation 125 Strand, E. 110, 111 stratified randomisation 350, 351 structural measures: dMRI 302–303; MRI 302, 303 structural memory 58 structural priming 462 Sturt, P. 338, 339 subdomains, language assessment 398–399 subjective measure of word frequency 254 subjective well-being 475–476 subject-object-verb (SOV) 156 subject relative clauses (SR) 55, 57 subject-verb agreement errors 51 sub-lexical processing 13 Sumiyoshi, C. 412 summarization 132, 353–354 superconducting quantum interference devices (SQUIDs) 15 superlative reading 77 supraglottic/supra-laryngeal articulatory studies 29 switch cost 451 switch reference marking 159–160 synchronization 292, 294 syntactic development 377, 395

syntactic movement 61, 62 syntactic parsing 273 syntactic principles 340 syntactic theory 51, 55, 63 syntactotopic conjecture 62 Tabbaa, L. 294 tachistoscopes 11–13 Taft, M. 42 Tanenhaus, M.K. 14, 41, 96, 197, 465 target speech sounds 202 target speech stimulus 205 task-based dialogue 197 task demands 206, 395 task-evoked responses of the pupil (TEPRs) 60 Tatler, B.W. 251 Taylor, L.J., 176 testable hypotheses 29–30 test memorization strategies 461 test-taking 166 text base 235 text comprehension 86; conscious awareness 235; contributions and research 238–244; critical issues and topics 236–238; definition 235; future directions 245–246; historical perspective 235–236; practice recommendations 245; research methods 244–245 theory of mind (TOM) 86, 87, 90, 99, 409 Thieberger, N. 161 Third-age Language Learning (TALL) 473 Thompson, L. C. 14 Thornton, R. 380, 397, 399 Thurstone, L.L. 411 time stamp methods 294 Tinker, M.A. 251 Tiselius, E. 291 Tomblin, J.B. 398, 399 total reading time 219, 258 Toulmin model 141 Toulmin, S. 141 trace deletion hypothesis 63 tracking eye movements 380–383 traditional standardized assessments 412–413 transcranial direct current stimulation (tDCS) 417 transcranial magnetic stimulation (TMS) 177, 178–179, 179 transcriptions 28, 413, 479 transformational generative theory 52 translation production 451–452 translation recognition 451–452 transparency 3, 39, 42, 43, 82, 308 transparent compound 255 traumatic brain injuries (TBI) 415 tree pruning hypothesis 63

500

Index trustworthiness 98, 99, 113, 147 truth evaluation paradigm 111 truth value judgement tasks (TVJT) 336–338, 379–381, 380–383, 383–384, 397, 425–426 Tskhovrebova, E. 127, 350 turn-taking dynamics 287 Ullman, M.T. 45 unaccusative verbs 341 unambiguous sentence 427 uncinate fasciculus 303 uncontrolled training effects 226 unergative verbs 341 unification 62 uninformative prior 326 universal quantifiers 77 unmatched task demands 226 Urgesi, C. 417 usage-based theories 341 Ushida, E. 480 utility theory 145 Utsumi, A. 97 utterance interpretation 86, 87–89

visual word recognition: basics of paradigm 450; masked priming paradigm 450 visual world eye-tracking 127, 429–430 visual-world paradigm (VWP) 14, 40–41, 46, 59, 94, 95, 96, 123, 130, 165, 219, 256, 268, 271–272, 338, 379, 382, 384–385, 386n2, 442; bilingual language processing 443, 444; blank screen paradigm 449–450; characteristics 448–449; data analysing 449; display types 443; monolingual language processing 444 Vitale, F. 178 vocabulary acquisition 478 vocabulary spurts 91 voice onset time (VOT) 202; definition 209; EEG experiment 209–211; EEG study 209–211; ERPs 210; methodological take- home message 211; N1 ERP component 209; notions 209 voice quality measures 28 Volkers, N. 390 Volkmann, A.W. 11 voluntariness 359, 363, 365 voxel-based morphometry (VBM) 302

Vabalas, A. 291 Valtakari, N.V. 286, 288 van Berkum, J.J.A. 127 Van Dijk, T.A. 235, 237 van Eemeren, F.H. 142 Vanek, N. 225 van Gent, P. 108 van Gompel, R.P. 126 Vanlangendonck, F. 197 van Silfhout, G. 123 van Tiel, B. 77, 78 Vasishth, S. 314, 320 Veldre, A. 262 verbal clues 85 verbal communication 85 verbal fluency tasks 465 verbal guise technique 108 verbal irony 89–91, 100 verbal leaning 173 verb bias 429 verb-finality 156, 157 verification tasks 124 video-based techniques 14 video conference technology 356 virtual reality (VR) 278, 294 visual acuity 252 visual language 8 visual mismatch negativity (vMMN) 222 visual stimuli 277 visual verification task 126

Wade, N.J. 10, 251 Wagenmakers, E.-J. 325 Waldon, B. 80 Walsh, E. 240 Walton, D.N. 143 Ware, C. 476 Watson, D.G. 195 Watson, K. 113, 114 web-based experimentation: benefits 363, 363–366; drawbacks 363, 363, 366–369; factors 370 webcam-based eye tracking 165 Weber, A. 446 Wechsler Adult Intelligence Scale 411 Weiss, E.M. 412 Wei, W. 242, 245 Wernicke, C. 9, 10 Wernicke’s aphasia 63, 415 Wernicke’s area 10, 300 Western Educated Industrialized Rich Democratic (WEIRD) 157, 350, 358, 458 West Flanders Dutch (WFD) 110 Wexler, K. 336 Whalen, D.H. 161 White, L. 333, 336, 337, 425 whole-form storage vs. decomposition 41 Wilkes-Gibbs, D. 193 Williams, C. 241, 242 window-based analysis 276 Winner, E. 91 Winograd, T. 12

501

Index wonderment questions 480 word-ordering tasks 412 word processing 252 word production 188–189 working memory 51, 55, 58, 59, 62, 235, 236, 338, 377, 444, 476 workspace-hidden condition 193, 194 workspace-visible condition 193, 194 world wide web (WWW) 357 Xiang, M. 78 Yalcin, S. 79, 80 Yan, G. 253

Yasunaga, D. 165 Yee, E. 382 Ye, Y. 113 Yi, E. 128 Yoon, S.O. 192 Zang, C. 251, 262 Zencastr software 30 Zerkle, S.A. 196 Zhang, X. 398, 399 Zima, E. 290, 291 Zufferey, S. 127 Zurif, E.B. 63 Zwaan, R.A. 176

502